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Abstract 



We develop a new technique that allows us to show in a unified way that many well-known 
combinatorial theorems, including Turan's theorem, Szemeredi's theorem and Ramsey's theorem, 
hold almost surely inside sparse random sets. For instance, we extend Turan's theorem to the 
random setting by showing that for every e > and every positive integer t > i there exists a 
constant C such that, if G is a random graph on n vertices where each edge is chosen independently 
with probability at least Cn~^/*^*"'"^\ then, with probability tending to 1 as n tends to infinity, every 
subgraph of G with at least ~ jri + ^ ^(G) edges contains a copy of Kt- This is sharp up to 
the constant C . We also show how to prove sparse analogues of structural results, giving two main 
applications, a stability version of the random Turan theorem stated above and a sparse hypergraph 
removal lemma. Many similar results have recently been obtained independently in a different way 
by Schacht and by Friedgut, Rodl and Schacht. 

1 Introduction 

In recent years there has been a trend in combinatorics towards proving that certain well-known 
theorems, such as Ramsey's theorem, Turan's theorem and Szemeredi's theorem, have "sparse random" 
analogues. For instance, the first non-trivial case of Turan's theorem asserts that a subgraph of 
with more than \n/2\ \n/2\ edges must contain a triangle. A sparse random analogue of this theorem 
is the assertion that if one defines a random subgraph G of Kn by choosing each edge independently 
at random with some very small probability p, then with high probability every subgraph H of G such 
that \E{H)\ > + e) \E{G)\ will contain a triangle. Several results of this kind have been proved, 
and in some cases, including this one, the exact bounds on what p one can take are known up to a 
constant factor. 

The greatest success in this line of research has been with analogues of Ramsey's theorem [42]. 
Recall that Ramsey's theorem (in one of its many forms) states that, for every graph H and every 
natural number r, there exists n such that if the edges of the complete graph Kn are coloured with 
r colours, then there must be a copy of H with all its edges of the same colour. Such a copy of H is 
called monochromatic. 

Let us say that a graph G is {H, r)-Ramsey if, however the edges of G are coloured with r colours, 
there must be a monochromatic copy of H. After efforts by several researchers [6, 38, 44, 45, 46], most 
notably Rodl and Rucinski, the following impressive theorem, a "sparse random version" of Ramsey's 
theorem, is now known. We write Gn,p for the standard binomial model of random graphs, where each 
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edge is chosen independently with probabihty p. We also write vg and eg for the number of vertices 

and edges, respectively, in a graph G. 



Theorem 1.1. Let r > 2 be a natural number and let H be a graph that is not a star forest. Then 
there exist positive constants c and C such that 

f if p < cn'^/^^C-f^) 
^lim P(G„,, is iH,r)-Ramsey) = | ^ Cn-V™.W. 



where 

iu\ eK-1 

moiri) = max -. 

KCH,VK>S VK -2 

That is, given a graph G which is not a disjoint union of stars, there is a threshold at approximately 
p = n~^/"*2(^f) where the probability that the random graph Gn,p is (-ff, r)-Ramsey changes from 
to 1. 

This theorem comes in two parts: the statement that above the threshold the graph is almost 
certainly (if, r)-Ramsey and the statement that below the threshold it almost certainly is not. We 
shall follow standard practice and call these the 1-statement and the 0-statement, respectively. 

There have also been some efforts towards proving sparse random versions of Turan's theorem, 

but these have up to now been less successful. Turan's theorem [60], or rather its generalization, the 
Erdos-Stonc-Simonovits theorem (sec for example [2]), states that if H is some fixed graph, then any 
graph with n vertices that contains more than 

edges must contain a copy of H. Here, xi^) is the chromatic number of H. 

Let us say that a graph G is {H, e)- Turdn if every subgraph of G with at least 

edges contains a copy of H. One may then ask for the threshold at which a random graph becomes 
{H, e)-Turan. The conjectured answer [34] is that the threshold is the same as it is for the corresponding 
Ramsey property. 

Conjecture 1.2. For every e > and every graph H there exist positive constants c and G such that 
lim F(Gnp IS iH,e)-Turdn) = | ^ ^ ^(ZTm 



where 



m2{H) = max 



e/^ - 1 



KcH,vk>3 vk -2 



A difference between this conjecture and Theorem 1.1 is that the 0-statement in this conjecture 
is very simple to prove. To see this, suppose that p is such that the expected number of copies of H 
in Gn,p is significantly less than the expected number of edges in Gn,p- Then we can remove a small 
number of edges from Gn,p and get rid of all copies of i?, which proves that G is not (iJ, e)-Turan. 
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The expected number of copies of H (if we order the vertices of H) is approximately n^'^p^'^ , while 
the expected number of edges in Gn,p is approximately pn^. The former becomes less than the latter 
when p = n-(''»~^)/(^»-^). 

A further observation raises this bound. Suppose, for example, that is a triangle with an 
extra edge attached to one of its vertices. It is clear that the real obstacle to finding copies of H is 
finding triangles: it is not hard to add edges to them. More generally, if H has a subgraph K with 
> then we can increase our estimate of p to rj~(''^'~2)/(eif-i) since if we can get rid of 

copies of K then we have got rid of copies of H. Beyond this extra observation, there is no obvious 
way of improving the bound for the 0-statement, which is why it is the conjectured upper bound as 
well. 

An argument along these lines does not work at all for the Ramsey property, since if one removes 
a few edges in order to eliminate all copies of H in one colour, then one has to give them another 
colour. Since the set of removed edges is likely to look fairly random, it is not at all clear that this 
can be done in such a way as to eliminate all monochromatic copies of H. 

Conjecture 1.2 is known to be true for some graphs, for example K^, K^^, (sec [6, 34, 16], 
respectively) and all cycles (see [11, 24, 25]), but it is open in general. Some partial results towards 
the general conjecture, where the 1-statement is proved with a weaker exponent, have been given by 
Kohayakawa, Rodl and Schacht [35] and Szabo and Vu [55] . The paper of Szabo and Vu contains the 
best known upper bound in the case where H is the complete graph Kt for some t > 6; the bound they 
obtain is p = n^^/(''^^^-^\ whereas the conjectured best possible bound is p = n~'^^^^^^^ (since m2{Kt) 
works out to be {t + l)/2). Thus, there is quite a significant gap. The full conjecture has also been 
proved to be a consequence of the so-called KLR conjecture [34] of Kohayakawa, Luczak and Rodl, 
but this conjecture, regarding the number of H-free graphs of a certain type, remains open, except in 
a few special cases [14, 15, 16, 36]. 

As noted in [32, 34], the KLR conjecture would also imply the following structural result about H- 
free graphs which contain nearly the extremal number of edges. The analogous result in the dense case, 
due to Simonovits [54] , is known as the stability theorem. Roughly speaking, it says that if a iJ-free 
graph contains almost ^1 — ^pjpj-) (2) ^dges, then it must be very close to being (xiH) — l)-partite. 

Conjecture 1.3. Let H be a graph with x(-ff) > 3 and let 

(Tj\ ex -I 
moi-n) = max . 

Then, for every 5 > 0, there exist positive constants e and C such that if G is a random graph on n ver- 
tices, where each edge is chosen independently with probability p at least Cn~^/™''^^^\ then, with proba- 
bility tending to 1 as n tends to infinity, every H-free subgraph of G with at least ^1 — — ^ e(G) 
edges may be made {x{H) — l)-partite by removing at most Spn^ edges. 

Another example where some success has been achieved is Szemercdi's theorem [56]. This cele- 
brated theorem states that, for every positive real number S and every natural number k, there exists 
a positive integer n such that every subset of the set [n] = {1, 2, ■ ■ ■ , n} of size at least Sn contains an 
arithmetic progression of length k. The particular case where k = 3 had been proved much earlier by 
Roth [51], and is accordingly known as Roth's theorem. A sparse random version of Roth's theorem 
was proved by Kohayakawa, Luczak and Rodl [33]. To state the theorem, let us say that a subset / 
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of the integers is 6-Roth if every subset of / of size 5\I\ contains an arithmetic progression of length 
3. We shall also write [n\p for a random set in which each element of [n] is chosen independently with 
probability p. 



Theorem 1.4. For every 5 > Q there exist positive constants c and C such that 

lim P([n]p is 5 -Roth) = 



0, if p < cn 

1, ifp>Cn-^/'^. 

Once again the 0-statement is trivial (as it tends to be for density theorems): if p = n~^/^/2, 
then the expected number of progressions of length 3 in [n]p is less than n-'^/^/S, while the expected 
number of elements of [n]p is Therefore, one can almost always remove an element from each 

progression and still be left with at least half the elements of [n]p. 

For longer progressions, the situation has been much less satisfactory. Let us define a set / of 
integers to be {S, k)-Szemeredi if every subset of / of cardinality at least S\I\ contains an arithmetic 
progression of length k. Until recently, hardly anything was known at all about which random sets 
were (5, /c)-Szemeredi. However, that changed with the seminal paper of Green and Tao [22], who, 
on the way to proving that the primes contain arbitrarily long arithmetic progressions, showed that 
every pseudorandom set is {S, fc)-Szemeredi, if "pseudorandom" is defined in an appropriate way. Their 
definition of pseudorandomness is somewhat complicated, but it is straightforward to show that quite 
sparse random sets are pseudorandom in their sense. From this the following result follows, though 
we are not sure whether it has appeared explicitly in print. 

Theorem 1.5. For every 6 > and every k gN there exists a function p = p{n) tending to zero with 
n such that 

lim IP([n]p is {S, k)-Szemeredi) = 1. 

n— >-oo 

The approach of Green and Tao depends heavily on the use of a set of norms known as uniformity 
norms, introduced in [17]. In order to deal with arithmetic progressions of length k, one must use a 
uniformity norm that is based on a count of certain configurations that can be thought of as (k — 1)- 
dimensional parallelepipeds. These configurations have k degrees of freedom (one for each dimension 
and one because the parallelepipeds can be translated) and size 2''~^. A simple argument (similar to 
the arguments for the 0-statements in the density theorems above) shows that the best bound that 
one can hope to obtain by their methods is therefore at most p = n~^/'^ . This is far larger than 
the bound that arises in the obvious 0-statement for Szemeredi's theorem: the same argument that 
gives a bound of cn~^/^ for the Roth property gives a bound of cn~^/^'^~^) for the Szemeredi property. 
However, even this is not the bound that they actually obtain, because they need in addition a 
"correlation condition" that is not guaranteed by the smallness of the uniformity norm. This means 
that the bound they obtain is of the form n~°^^'> . 

The natural conjecture is that the obvious bound for the 0-statement is in fact correct, so it is far 
stronger than the bound of Green and Tao. 

Conjecture 1.6. For every S > and every positive integer k > 3, there exist positive constants c 
and C such that 

Jim P([n]p IS {6, k) -Szemeredi) = | .^^ ^ 
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One approach to proving Szemeredi's theorem is known as the hypergraph removal lemma. Proved 
independently by Nagle, Rodl, Schacht and Skokan [39, 50] and by the second author [19], this theorem 
states that for every S > and every positive integer A; > 3 there exists a constant e > such that if 
G is a A;-uniform hypergraph containing at most en'''^^ copies of the complete A;-uniform hypergraph 
iir^*j_^-^ on /c + 1 vertices, then it may be made ii'^,*j^-^-free by removing at most 5n^ edges. Once this 
theorem is known, Szemeredi's theorem follows as an easy consequence. The question of whether an 
analogous result holds within random hypergraphs was posed by Luczak [37]. For k = 3, the result 
follows from the work of Kohayakawa, Luczak and Rodl [33] . 

Conjecture 1.7. For every 6 > and every integer k > 3 there exist constants e > and C such that, 
if H is a random k-uniform hypergraph on n vertices where each edge is chosen independently with 
probability p at least Cn~^/^ , then, with probability tending to 1 as n tends to infinity, every subgraph 
of H containing at most ep^^^n^^^ copies of the complete k-uniform hypergraph K^^^ onk + l vertices 

may be made K^^^^-free by removing at most 5pn^ edges. 
1.1 The main results of this paper 

In the next few sections we shall give a very general method for proving sparse random versions of 
combinatorial theorems. This method allows one to obtain sharp bounds for several theorems, of 
which the principal (but by no means only) examples are positive answers to the conjectures we have 
just mentioned. This statement comes with one caveat. When dealing with graphs and hypergraphs, 
wc shall restrict our attention to those which are well-balanced in the following sense. Note that most 
graphs of interest, including complete graphs and cycles, satisfy this condition. 

Definition 1.8. A k-uniform hypergraph K is said to be strictly k-balanced if, for every subgraph L 
ofK, 

- 1 ^ ej, - 1 
vk — k vl — k 

The main results we shall prove in this paper (in the order in which we discussed them above, but 
not the order in which we shall prove them) are as follows. The first is a sparse random version of 
Ramsey's theorem. Of course, as we have already mentioned, this is known: however, our theorem 
applies not just to graphs but to hypergraphs, where the problem was wide open apart from a few 
special cases [48, 49]. As we shall see, our methods apply just as easily to hypergraphs as they do 
to graphs. We write Gn,p for a random A;-uniform hypergraph on n vertices, where each hyperedge 
is chosen independently with probability p. If K is some fixed /c-uniform hypergraph, we say that a 
hypergraph is {K, r)-Ramsey if every r-colouring of its edges contains a monochromatic copy of K. 

Theorem 1.9. Given a natural number r and a strictly k-balanced k-uniform hypergraph K, there 
exists a positive constant C such that 

lim P(G[f) is {K,r)-Ramsey) = 1, if p > Cn-^^^feW^ 

where mk{K) = {ck — ^)/{vk — k). 

One problem that the results of this paper leave open is to establish a corresponding 0-statement 
for Theorem 1.9. The above bound is the threshold below which the number of copies of L becomes less 
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than the number of hyperedges, so the results for graphs make it highly plausible that the 0-statement 
holds when p < cn"^/™*^^^ for small enough c. However, the example of stars, for which the threshold 
is lower than expected, shows that we cannot take this result for granted. 

We shall also prove Conjecture 1.2 for strictly 2-balanced graphs. In particular, it holds for complete 
graphs. 

Theorem 1.10. Given e > and a strictly 2-balanced graph H, there exists a positive constant C 
such that 

lim^F{Gn,p is {H,e)-Turdn) = 1, ifp> Cn-^/'^^^^\ 

where m-2{H) = [en — ^)/{vh — 2). 

A slightly more careful application of our methods also allows us to prove its structural counterpart. 
Conjecture 1.3, for strictly 2-balanced graphs. 

Theorem 1.11. Given a strictly 2-balanced graph H with x{H) > 3 and a constant (5 > 0, there exist 
positive constants C and e such that in the random graph Gn,p chosen with probability p > Cn''^/'^'^^^\ 
where m^^H) = (cij — l)/('Uif— 2), the following holds with probability tending to 1 as n tends to infinity. 
Every H-free subgraph of Gn,p with at least {l — — e(G) edges may be made (x(-ff) — 1)- 

partite by removing at most 6pn^ edges. 

We also prove Conjecture 1.6, obtaining bounds for the Szemeredi property that are essentially 
best possible. 

Theorem 1.12. Given S > and a natural number k >3, there exists a constant C such that 

lim P([n]p is {k, 6) -Szemeredi) = 1, if p > Cn~^^^''~^\ 

Our final main result is a proof of Conjecture 1.7, the sparse hypergraph removal lemma. As we 

have mentioned, the dense hypergraph removal lemma implies Szemeredi's theorem, but it turns out 
that the sparse hypergraph removal lemma does not imply Theorem 1.12. The difficulty is this. When 
we prove Szemeredi's theorem using the removal lemma, we first pass to a hypergraph to which the 
removal lemma can be applied. Unfortunately, in the sparse case, passing from the sparse random set 
to the corresponding hypergraph gives us a sparse hypergraph with dependences between its edges, 
whereas in the sparse hypergraph removal lemma we assume that the edges of the sparse random 
hypergraph are independent. While it is likely that this problem can be overcome, we did not, in the 
light of Theorem 1.12, see a strong reason for doing so. 

In addition to these main results, we shall discuss other density theorems, such as Turan's theorem 
for hypcrgraphs (where, even though the correct bounds are not known in the dense case, we can 
obtain the threshold at which the bounds in the sparse random case will be the same), the multidi- 
mensional Szemeredi theorem of Furstenberg and Katznelson [13] and the Bcrgelson-Leibman theorem 
[1] concerning polynomial configurations in dense sets. In the colouring case, we shall discuss Schur's 
theorem [53] as a further example. Note that many similar results have also been obtained by a 
different method by Schacht [52] and by Priedgut, Rodl and Schacht [10]. 
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1.2 A preliminciry description of the argument 

The basic idea behind our proof is to use a transference principle to deduce sparse random versions 
of density and colouring results from their dense counterparts. To oversimplify slightly, a transference 
principle in this context is a statement along the following lines. Let X be a structure such as the 
complete graph Kn or the set {1,2,..., n}, and let U he a sparse random subset of X. Then, for every 
subset A C U, there is a subset B C X that has similar properties to A. In particular, the density of 
B is approximately the same as the relative density of A in U, and the number of substructures of a 
given kind in A is an appropriate multiple of the number of substructures of the same kind in B. 

Given a strong enough principle of this kind, one can prove a sparse random version of Szemeredi's 
theorem, say, as follows. Let A be a subset of [n]p of relative density S. Then there exists a subset B of 
[n] of size approximately 6n such that the number of progressions of length k in B is approximately 
times the number of progressions of length k in A. From Szemeredi's theorem it can be deduced that 
the number of progressions of length k in B is at least c{6)n^, so the number of progressions of length 
A; in ^ is at least c{5)p^v?' /2. Since the size of A is about pn, we have non-degenerate progressions as 
long as p is at least Cn~^/^^~^\ 

It is very important to the success of the above argument that a dense subset of [n] should contain 
not just one progression but several, where "several" means a number that is within a constant of the 
trivial upper bound of v?. The other combinatorial theorems discussed above have similarly "robust" 
versions and again these are essential to us. Very roughly, our general theorems say that a typical 
combinatorial theorem that is robust in this sense will have a sparse random version with an upper 
bound that is very close to a natural lower bound that is trivial for density theorems and often true, 
even if no longer trivial, for Ramsey theorems. 

It is also very helpful to have a certain degree of homogeneity. For instance, in order to prove 
the sparse version of Szemeredi's theorem we use the fact that it is equivalent to the sparse version 
of Szemeredi's theorem in Z„, where we have the nice property that for every k and every j with 
1 < j < fc; every element x appears in the jth place of an arithmetic progression of length k in exactly 
n ways (or n — 1 if you discount the degenerate progression with common difference 0). It will also 
be convenient to assume that n is prime, since in this case we know that for every pair of points x, y 
in Z„ there is exactly one arithmetic progression of length k that starts with x and ends in y. This 
simple homogeneity property will prove useful when we come to do our probabilistic estimates. 

The idea of using a transference principle to obtain sparse random versions of robust combinatorial 
statements is not what is new about this paper. In fact, this was exactly the strategy of Green and 
Tao in their paper on the primes, and could be said to be the main idea behind their proof (though 
of course it took many further ideas to get it to work). It is also possible to regard the proof given 
by Kohayakawa, Luczak and Rodl of the sparse random version of Roth's theorem as involving a 
transference principle. The reason, briefly, is that they deduced their result from a sparse random 
version of Szemeredi's regularity lemma. But if one has such a regularity lemma together with an 
appropriate counting lemma, then one can transfer subgraphs of sparse random graphs to subgraphs 
of Kn as follows. If G is a subgraph of Gn,p, then use the sparse regularity lemma to find a regular 
partition of G. Suppose two of the vertex sets in this partition are A and B, and that the induced 
bipartite graph G{A^ B) is regular. Then form a random bipartite graph with vertex sets A and B 
with density p~^ times the density of G{A, B) (which is the relative density of G{A, B) inside the 
random graph). If you do this for all regular pairs, then the sparse counting lemma (which is far from 
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trivial to prove) will tell you that the behaviour of the resulting dense graph is similar to the behaviour 
of the original graph. 

It is difHcult to say what is new about our argument without going into slightly more detail, so 
we postpone further discussion for now. However, there are three further main ideas involved and we 
shall highlight them as they appear. 

In the next few sections, wc shall find a very general set of criteria under which one may transfer 
combinatorial statements to the sparse random setting. In Sections 5-8, we shall show how to prove 
that these criteria hold. Section 9 is a brief summary of the general results, both conditional and 
unconditional, that have been proved up to that point. In Section 10, we show how these results may 
be applied to prove the various theorems promised in the introduction. In Section 11, we conclude by 
briefly mentioning some questions that are still open. 

1.3 Notation 

We finish this section with some notation and terminology that we shall need throughout the course 
of the paper. By a measure on a finite set X we shall mean a non-negative function from X to R. 
Usually our measures will have average value 1, or very close to 1. The characteristic measure /x of a 
subset U of X will be the function defined by /u(x) = |X|/|?7| if x ^ U and fi{x) = otherwise. 

Often our set U will be a random subset of X with each element of X chosen with probability 
p, the choices being independent. In this case, we shall use the shorthand U = Xp, just as we wrote 
[n]p for a random subset of [n] in the statement of the sparse random version of Szemeredi's theorem 
earlier. When U = Xp it is more convenient to consider the measure // that is equal to times the 
characteristic function of U. That is, i-i{x) = p^^ ii x G U and otherwise. To avoid confusion, we 
shall call this the associated measure of U. Strictly speaking, we should not say this, since it depends 
not just on U but on the value of p used when U was chosen, but this will always be clear from the 
context so we shall not bother to call it the associated measure of {U, p) . 

For an arbitrary function / from X to M we shall write E^./(x) for ^x&x fi^)- Note that 

if ^ is the characteristic measure of a set U, then E,xfi{x) = 1 and Mxn{x)f{x) = Ea;g(//(x) for any 
function f.liU = Xp and is the associated measure of U, then we can no longer say this. However, 
we can say that the expectation of Kxn{x) is 1. Also, with very high probability the cardinality of U 
is roughly p\X\, so with high probability KxiJ,{x) is close to 1 and Kxn{x)f{x) is close to E^^ufix) for 
all functions /. 

More generally, if it is clear from the context that k variables xi,...,Xk range over finite sets 
Xi, . . . , Xk, respectively, then Exj^,...,Xk will be shorthand for \Xi\^^ . . . \Xk\~^ Z^^ieXi " " " J2xkeXk- 
the range of a variable is not clear from the context then we shall specify it. We define an inner 
product for real- valued functions on X by the formula (/, g) = ¥,xf{x)g{x), and we define the Lp norm 
by ll/llp = (E,|/(x)|p)Vp. In particular, ||/||i = E,|/(x)| and ||/||oo = max, 

Let ||.|| be a norm on the space M"''^. The dual norm \\.\\* of {|.{| is a norm on the collection of linear 
functionals (j) acting on given by 

|Hr=sup{|(/,</.)|: 11/11 <1}. 

It follows trivially from this definition that \{f,(f))\ < ||/|| ||0||*- Almost as trivially, it follows that if 
|(/, '?!')| < 1 whenever ||/|| < r], then ||(^||* < ri~^, a fact that will be used repeatedly. 
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2 Transference principles 



As we have already mentioned, a central notion in this paper is that of transference. Roughly speaking, 

a transference principle is a theorem that states that every function / in one class can be replaced by 
a function g in another, more convenient class in STich a way that the properties of / and g are similar. 

To understand this concept and why it is useful, let us look at the sparse random version of 
Szemeredi's theorem that we shall prove. Instead of attacking this directly, it is convenient to prove 
a functional generalization of it. The statement we shall prove is the following. 

Theorem 2.1. For every positive integer k and every S > there are positive constants c and C with 
the following property. Let p > Cn~^/(^~^) and let U be a random subset ofLn where each element is 
chosen independently with probability p. Let fi be the associated measure of U and let f be a function 
such that < / < /Li and Kxf{x) > S. Then, with probability tending to 1 as n tends to infinity, 

E,,,df{x)f{x + d)...f{x + {k- l)d) > c. 

To understand the normalization, it is a good exercise (and an easy one) to check that the expected 
value of Ej. (i//(a;);u(x+(i) . . . jj,{x+{k — l)d) is close to 1, so that the conclusion of Theorem 2.1 is stating 
that 'Ex^dfix)f{x + d) . . . f{x + {k — l)d) is within a constant of its trivial maximum. (If p is smaller 
than n~^/^'^~^) then this is no longer true: the main contribution to 'Ex,dl^{x)ii{x + d) . . . fi{x + {k — l)d) 
comes from the degenerate progressions where d = 0.) 

Our strategy for proving this theorem is to "transfer" it from the sparse set U to Z„ itself and 
then to deduce it from the following robust functional version of Szemeredi's theorem, which can be 
proved by a simple averaging argument due essentially to Varnavides [62]. 

Theorem 2.2. For every 6 > and every positive integer k there is a constant c > such that, for 
every positive integer n, every function g : Z„ — )■ [0, 1] with E,xg{x) > S satisfies the inequality 

^x,d9i.x)g{x + d)...g{x + {k- l)d) > c. 

Note that in this statement we are no longer talking about dense subsets of Z„, but rather about 
[0, l]-valued functions defined on Zjj with positive expectation. It will be important in what follows 
that any particular theorem we wish to transfer has such an equivalent functional formulation. As we 
shall see in Section 4, all of the theorems that we consider do have such formulations. 

Returning to transference principles, our aim is to find a function g with < g < 1 for which we 
can prove that Exg{x) ~ E^/(x) and that 

E^,d9{x)gix + d)...g{x + ik- l)d) ^ E,^df{x)f{x + d) . . . f{x + {k - l)d). 

We can then argue as follows: if Kxf{x) > S, then Kxg{x) > 6/2; by Theorem 2.2 it follows that 
^x,dg{x)g{x+d) . . . g{x+{k~l)d) is bounded below by a constant c; and this implies that Kx^df{x)f{x+ 
d)'... f{x + {k- l)d) > c/2. 

In the rest of this section we shall show how the Hahn-Banach theorem can be used to prove general 
transference principles. This was first demonstrated by the second author in [20], and independently 
(in a slightly different language) by Reingold, Trevisan, Tulsiani and Vadhan [43], and leads to simpler 
proofs than the method used by Green and Tao. The first transference principle we shall prove is 
particularly appropriate for density theorems: this one was shown in [20] but for convenience we 
repeat the proof. Then we shall prove a modification of it for use with colouring theorems. 

Let us begin by stating the finite-dimensional Hahn-Banach theorem in its separation version. 
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Lemma 2.3. Let K be a closed convex set in M" containing and let v be a vector that does not 
belong to K. Then there is a real number t and a linear functional (p such that (p{v) > t and such that 
4>{w) < t for every w E K. 

The reason the Hahn-Banach theorem is useful to us is that one often wishes to prove that one 
function is a sum of others with certain properties, and often the sets of functions that satisfy those 
properties are convex (or can easily be made convex). For instance, we shall want to write a function 
/ with < / < /U as a sum g + h with < 5 < 1 and with h small in a certain norm. The following 
lemma, an almost immediate consequence of Lemma 2.3, tells us what happens when a function cannot 
be decomposed in this way. We implicitly use the fact that every linear functional on has the form 
/ I-)- (/, (j)) for some (j). 

Lemma 2.4. Let Y be a finite set and let K and L be two subsets ofMX that are closed and convex 
and that contain 0. Suppose that f ^ K + L. Then there exists a function (j) G such that (/, 0) > 1 
and such that {g, (p) <1 for every g & K and {h, (f)) < 1 for every h E L. 

Proof. By Lemma 2.3 there is a function cf) and a real number t such that (/, (j)) > t and such that 
{g + h,(l)) < t whenever g E K and h e L. Setting h = we deduce that {g, (f)) < t for every g e K, 
and setting 51 = we deduce that {h, (f)) < t for every h E L. Setting g = h = we deduce that t > 0. 
Dividing through by t (or by ^(/, 0) if t = 0) we see that we may take t to be L □ 

Now let us prove our two transference principles, beginning with the density one. In the statement 
of the theorem below we write 0+ for the positive part of (f). 

Lemma 2.5. Let e and rj be positive real numbers, let /x and v be non-negative functions defined on a 
finite set X and let \\.\\ be a norm onM.^ . Suppose that (// — i/, (/>_|_) < e whenever ||(/)||* < ri~^ . Then for 
every function f with < f < fi there exists a function g with < g < u such that ||(l+e)~^/— 5f|| < r]. 

Proof. If we cannot approximate (1 + e)~^f in this way, then we cannot write (1 + e)~^f as a sum 
g + h with < g < ly and \\h\\ < rj. Now the sets K = {g : < g < and L = {h : \\h\\ < r]} are 
closed and convex and they both contain 0. It follows from Lemma 2.4, with Y = X, that there is a 
function (p with the following three properties. 

. ((i + e)-V,<^)>i; 

• {g, 1 whenever < g < v; 

• {h,(j)) < 1 whenever \\h\\ < rj. 

From the first of these properties we deduce that (/, (p) > 1 + e. From the second we deduce that 
{v,(p+) < 1) since the function g that takes the value v{x) when (p{x) > and otherwise maximizes 
the value of {g, (p) over all g E K. And from the third property we deduce immediately that ||^||* < rj"^. 
But our hypothesis implies that {n,4>+) < {i^,(p+) + £■ It therefore follows that 

l + e<{f,4>) < {f,cP+) < < {iy,(p+)+e<l + €, 

which is a contradiction. □ 
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Later we shall apply Lemma 2.5 with /x the associated measure of a sparse random set and v the 
constant measure 1. 

The next transference principle is the one that we shall use for obtaining sparse random colouring 
theorems. It may seem strange that the condition we obtain on 51 + • • • + is merely that it is less 

than V (rather than equal to v\ However, we also show that /j and gi are close in a certain sense, 
and in applications that will imply that 51 + • • • + Qr is indeed approximately equal to v (which will 
be the constant measure 1). With a bit more effort, one could obtain equality from the Hahn-Banach 
method, but this would not make life easier later, since the robust versions of Ramsey theorems hold 
just as well when you colour almost everything as they do when you colour everything. 

Lemma 2.6. Let e and rj be positive real numbers, let r be a positive integer, let ji and v be non-negative 
functions defined on a finite set X and let \\.\\ be a norm on M.-^ . Suppose that {fi—u, {mayii<i<,r (t^i)+) < 
e whenever (pi, . . . , (pr are functions with \\(pi\\* < r]~^ for each i. Then for every sequence ofr functions 
fi, . . . , fr with < fi < jJ- for each i and /i + • • • + /r < there exist functions gi, ■ ■ ■ ,gr with < gi 
and 5i + • • • + ^ such that ||(1 + e)~^fi — gi\\ < rj for every i. 

Proof. Suppose that the result docs not hold for the r-tuple (/i, . . . , /^). Let K be the closed convex 
set of all r-tuples of functions [gi, . . . , gr) such that < gi and gi + - ■ ■ + gr < i^, and let L be the closed 
convex set of all r-tuples {hi, . . . ,hr) such that \\hi\\ < rj for every i. Then both K and L contain 
and our hypothesis is that (1 + e)~^(/i, . . . , fr) ^ K + L. Therefore, Lemma 2.4, with Y = X^, gives 
us an r-tuple of functions {(pi, . . . , (pr) with the following three properties. 

• ELi((i + erV^,</'^)>i; 

• Yli=i{9i^ 0i) < 1 whenever < gi for each i and gi + ■ ■ • + gr < t^', 

• X)i=i(^i'^i) ^ 1 whenever \\hi\\ < rj for every i. 

The first of these conditions implies that ^ ■^^(/j, (pi) > 1 + e. In the second condition, let us choose 
the functions g^ as follows. For each x, pick an i such that (j)i{x) is maximal. If (pi{x) > 0, then set gi{x) 
to be ^{x), and otherwise set gi{x) = 0. For each j ^ i, set gj{x) to be zero. Then '^\^i gi{x)(pi{x) is 
equal to ^{x) maxj (pi{x) if this maximum is non-negative, and otherwise. Therefore, 'Yl\=i{9ii 4'i) = 
(i^, (maxj Thus, it follows from the second condition that {v, (maxj < 1. Let us write (p 

for maxj(/)j. The third condition implies that \\(pi\\* < t]^^ for each i. 

Using this information together with our hypothesis about jj, — v, we find that 

r r 

1 + e < ^{fi, <Pi) < ^{fi, <P+) < (/^, 0+) <{i^,(p+) + e<l + e, 
1=1 1=1 

a contradiction. □ 

3 The counting lemma 

We now come to the second main idea of the paper, and perhaps the main new idea. Lemmas 2.5 
and 2.6 will be very useful to us, but as they stand they are rather abstract: in order to make use 
of them we need to find a norm ||.|| such that if ||/ — is small then / and g behave similarly in a 
relevant way. Several norms have been devised for exactly this purpose, such as the uniformity norms 
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mentioned earlier, and also "box norms" for multidimensional structures and "octahedral norms" for 
graphs and hypergraphs. It might therefore seem natural to try to apply Lemmas 2.5 and 2.6 to these 
norms. However, as we have already commented in the case of uniformity norms, if we do this then we 
cannot obtain sharp bounds: except in a few cases, these norms are related to counts of configurations 
that are too large to appear nondegenerately in very sparse random sets. 

Wc arc therefore forced to adopt a different approach. Instead of trying to use an off-the-shelf 
norm, we use a bespoke norm, designed to fit perfectly the problem at hand. Notice that Lemmas 
2.5 and 2.6 become harder to apply as the norm |{.|| gets bigger, since then the dual norm ||.||* gets 
smaller and there are more functions 4> with ||i;^>||* < and therefore more functions of the form 
(j)^ for which one must show that (/i — i^, </>+) < e (and similarly for {maxi<:i<r 4>i)+ with colouring 
problems). Therefore, we shall try to make our norm as small as possible, subject to the condition we 
need it to satisfy: that / and g behave similarly if ||/ — is small. 

Thus, our norm will be defined by means of a universal construction. As with other universal 
constructions, this makes the norm easy to define but hard to understand concretely. However, we can 
get away with surprisingly little understanding of its detailed behaviour, as will become clear later. 
An advantage of this abstract approach is that it has very little dependence on the particular problem 
that is being studied: it is for that reason that we have ended up with a very general result. 

Before we define the norm, let us describe the general set-up that we shall analyse. We shall begin 
with a finite set X and a collection S of ordered subsets of X, each of size k. Thus, any element s e S 
may be expressed in the form s = (si, . . . , s^). 

Here are two examples. When we apply our results to Szemeredi's theorem, we shall take X to be 
Z„, and S to be the set of ordered A;-tuples of the form {x,x + d, . . . ,x + {k — l)d), and when we apply 
it to Ramsey's theorem or Turan's theorem for K4, we shall take X to be the complete graph and 
S to be the set of ordered sextuples of pairs of the form {xiX2,xiX3, X1X4, 2:23:3, X2X4^, 2:3X4), where xi, 
X2, X3 and X4 are vertices of Kn. Depending on the particular circumstance, we shall choose whether 
to include or ignore degenerate configurations. For example, for Szemeredi's theorem, it is convenient 
to include the possibility that d = 0, but for theorems involving K4, we restrict to configurations 
where xi, X2, X3 and X4 are all distinct. In practice, it makes little difference, since the number of 
degenerate configurations is never very numerous. 

In both these two examples, the collection S of ordered subsets of X has some nice homogeneity 
properties, which we shall assume for our general result because it makes the proofs cleaner, even if 
one sometimes has to work a little to show that these properties may be assumed. 

Definition 3.1. Let S be a collection of ordered k-tuples s = {si, . . . , s^) of elements of a finite set X, 
and let us write Sj{x) for the set of all s in S such that Sj = x. We shall say that S is homogeneous 
if for each j the sets Sj{x) all have the same size. 

We shall assume throughout that our sets of ordered fc-tuples are homogeneous in this sense. Note 
that this assumption does not hold for arithmetic progressions of length k if we work in the set [n] 

rather than the set Z„. However, sparse random Szemeredi for Z„ implies sparse random Szemeredi 
for [n], so this does not bother us. Similar observations can be used to convert several other problems 
into equivalent ones for which the set S is homogeneous. Moreover, such observations will easily 
accommodate any further homogeneity assumptions that we have to introduce in later sections. 

The functional version of a combinatorial theorem about the ordered sets in S will involve expres- 
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sions such as 

E,e5/(si).../(sfc). 
Thus, what we wish to do is define a norm ||.|{ with the property that 

^sesf{si) . . . f{sk) - Esesgisi) ■ ■ ■ g{sk) 

can be bounded above in terms of ||/ — g\\ whenever < / < and < g < v. This is what we mean 
by saying that / and g should behave similarly when ||/ — is small. 

The feature of the problem that gives us a simple and natural norm is the A;-linearity of the 
expression Egesfisi) ■ ■ ■ /(sfe), which allows us to write the above difference as 

k 

^^sesg{si) . . . g{sj-i){f - g){sj)f{sj+i) . . . f{sk). 

Because we are assuming that the sets Sj{x) all have the same size, we can write any expression of 
the form ¥,s£shi{si) . . . hk{sk) as 

^xexhj{x)Es^Sj{x)hi{s{) . . . hj-i{sj-i)hj+i{sj+i) . . . hk{sk). 

It will be very convenient to introduce some terminology and notation for expressions of the kind that 
are beginning to appear. 

Definition 3.2. Let X be a finite set and let S be a homogeneous collection of ordered subsets of X, 
each of size k. Then, given k functions hi, . . . ,hk from X to M, their jth. convolution is defined to be 
the function 

*j{hi, hk){x) = '^s&Sj{x)hi{si) . . . hj-i{sj^i)hj+i{sj+i) . . . hk{sk). 

Wc call this a convolution because in the special case where S is the set of arithmetic progressions 
of length 3 in Zjv, we obtain convolutions in the conventional sense. Using this notation and the 
observation made above, we can rewrite 

Esesg{si) ■ ■ ■ g{sj-i){f - g){sj)f{sj+i) . . . /(sfe) 
as {f — g,*j{g . . . , g, f, . . . , /)), and from that we obtain the identity 

k 

^sesfisi) ■ ■ ■ /(sfe) - Esesgisi) ■ ■ ■ g{sk) = ^{f - g,*jig . . . ,g, f, . . . , f)). 

j=i 

This, together with the triangle inequality, gives us the following lemma. 

Lemma 3.3. Let X be a finite set and let S be a homogeneous collection of ordered subsets of X of 
size k. Let f and g be two functions defined on X. Then 

k 

\Esesf{si) . . . f{sk) - Esesgisi) . . . g{sk)\ < J] |(/ - 5, ■ ■ ■ ,5, /, • • • , /))!• 
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It follows that \i f — g has small inner product with all functions of the form *j(g . . . , g, f, . . . , f), 
then EgQsfisi) . . . f{sk) and Kg^sgisi) ■ ■ ■ g{sk) are close. It is tempting, therefore, to define a norm ||.|| 
by taking \\h\\ to be the maximum value of 0)| over all functions of the form *j{g, . . . , g, f, . . . , f) 
for which < g < 1 and < / < /x. If we did that, then we would know that |Es£5/(si) . . . /(sfe) — 
¥,sGsgisi) ■ ■ ■ gisk)\ was small whenever ||/ — g\\ was small, which is exactly the property we need our 
norm to have. Unfortunately, this definition leads to difficulties. To see why we need to look in more 
detail at the convolutions. 

Any convolution *j{g, . . . , g, f, . . . , /) is bounded above by *j{u, . . . ,u, ^, . . . , fi). For the sake of 
example, let us consider the case of Szemeredi's theorem. Taking v = l,we see that the jih convolution 
is bounded above by the function 

Pji^) = ^ /^(x + d) . . . /x(a; + (A; - j)d). 
d 

Up to normalization, this counts the number of progressions of length k — j beginning at x. If 
j > 1, probabilistic estimates imply that, at the critical probability p = Cn~^/^^^^\ Pj is, with 
high probability, Loc-boundcd (that is, the largest value of the function is bounded by some absolute 
constant). However, functions of the form . . . , /) with < / < are almost always unbounded. 
This makes it much more difficult to control their inner products with f — g, and we need to do that 
if we wish to apply the abstract transference principle from the previous section. 

For graphs, a similar problem arises. The jth convolution will count, up to normalization, the 
number of copies of some subgraph of the given graph H that are rooted on a particular edge. If we 
assume that the graph is balanced, as we are doing, then, at probability p = Cn~^^'^^^^\ this count 
will be Loo-bounded for any proper subgraph of H. However, for H itself, we do not have this luxury 
and the function . . . , /) is again likely to be unbounded. 

If we were prepared to increase the density of the random set by a polylogarithmic factor, we could 
ensure that even . . . , /) was bounded and this problem would go away. Thus, a significant part of 
the complication of this paper is due to our wish to get a bound that is best possible up to a constant. 

There are two natural ways of getting round the difficulty if we are not prepared to sacrifice a 
polylogarithmic factor. One is to try to exploit the fact that although •••,/) is not bounded, 
it typically takes large values very infrequently, so it is "close to bounded" in a certain sense. The 
other is to replace *i (/,... , /) by a modification of the function that has been truncated at a certain 
maximum. It seems likely that both approaches can be made to work: we have found it technically 
easier to go for the second. The relevant definition is as follows. 

Definition 3.4. Let X be a finite set and let S be a homogeneous collection of ordered subsets of X 
of size k. Then, given k non-negative functions hi, . . . ,hk from X to M, their jth capped convolution 
Oj{hi, . . . , hk) is defined by 

Oj{hi,. . .,hk){x) = m.m{*j{hi, . . .,hk){x),2}. 

Unlike with ordinary convolutions, there is no obvious way of controlling the difference between 
Ks£sf(,si) . . . f{sk) and Ks£sg{si) ■ ■ ■ g{sk) in terms of the inner product between f — g and suitably 
chosen capped convolutions. So instead we shall look at a quantity that is related in a different way 
to the number of substructures of the required type. Roughly speaking, it counts the number of 
substructures, but does not count too many if they start from the same point. 
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A natural quantity that fits this description is (/, oi(/, /,..., /)), and this is indeed closely related 
to the quantity we shall actually consider. However, there is an additional complication, which is that, 
for reasons that we shall explain later, it is very convenient to think of our random set U as a union 
of m random sets J7i, . . . , Um, and of a function defined on U as an average m~^(/i + • • • + fm) of 
functions with fi defined on C/j. More precisely, we shall take m independent random sets Ui, . . . , Um, 
each distributed as Xp. (Recall that Xp stands for a random subset of X where the elements are 
chosen independently with probability p.) W'riting ni, . . . , Hm for their associated measures, for each 
i we shall take a function fi such that ^ < fi < Hi- Our assertion will then be about the average 

/ = H h fm). Note that < f < n, where H = m ^{ni + • • • + Mot), and that every function 

/ with < f < n can be expressed as an average of functions fi with < fi < fii. Note also that 
a U = UiU ■ ■ ■ U Um then /j, is neither the characteristic measure of U nor the associated measure of 
U. However, provided p is fairly small, it is close to both with high probability, and this is all that 
matters. 

Having chosen / in this way, the quantity we shall then look at is 

{f,m~^^~^^ ^ oi(/i2, . . . , /ij) = Ei^_..._j^g{i^..._OT}(/ii, oi(/i2> • --Jik))- 

i2,—,ik 

In other words, we expand the expression (/, /,..., /)) in terms of /i, . . . , /m and then do the 
capping term by term. 

Our aim will be to find a bounded non-negative function g such that the average 'Kxgix) is bounded 
away from zero, and such that (5, *i [g, g, . . . , g)) is close to (/, *i (/,/,...,/)) . Central to our approach 
is a "counting lemma" , which is an easy corollary of the following result, which keeps track of the 
errors that are introduced by our "capping" . (To understand the statement, observe that if we replaced 
the capped convolutions Oj by their "genuine" counterparts *j, then the two quantities that we are 
comparing would become equal.) In the next lemma, we assume that a homogeneous set S of ordered 
fc-tuples has been given. 

Lemma 3.5. Let rj > 0, let m > 2k^/r] and let fii, . . . , fXm be non-negative functions defined on X 
with < 2 for all i. Suppose that \\ *i (//jj, ■ ■ ■ , IJ^i^) — oi(Mi2) ■ ■ ■ ■> M«fe) 111 ^ V whenever 12, ■ ■ ■ ,ik 

are distinct integers between 1 and m, and also that *j(l, A^jj+i , • • • , y^ife) is uniformly bounded 

above by 2 whenever j > 2 and ■ ■ ■ ,ik are distinct. For each i let fi be a function with < fi < fii, 
let f = ^ifi and let g be a function with < g < 1. Then 

Eji,...,ifce{i,...,m}(/w,oi(/i2>--->/ifc)) - {9,*i{9,9,--- ,9)) 

differs from 

k 

^{f -g, ^ij+i,...,ik °3 i9,9,---, 9, 4+1 ,---,fik)) 

i=i 

by at most 2rj. 
Proof. Note first that 

IEii,...,ifc(/ii,oi(/j2, . . . ,/sJ) = Ej^,...,jj^(/ij - g,oi{fi^, . . . ,/jJ) +^i2,...,ik{9,°i{fi2^ ■ ■ • '/jj) 

= ^i2,-,ik{f ~ 9,°lifi2^ ■■■^fik)) +IEj2,.-,jfc(f)Ol(/j2) ■■■^fik))- 
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Since < *i(fi^, . . . , fi^^) < . . . , ^i^), our assumption implies that, whenever i2,...,ik are 

distinct, || *i (/ij, • • • , /ij -oi{fi^,..., /i < rj. In this case, therefore, 

0< {g,*i{fi2,---Jik))-{9^°i{fi2,---Jik)) 

We also know that {g,*i{fi2, ■ ■ ■ , fik)) = {fi2^*2{g, fis, ■ ■ ■ , fij) and that if is,... ,ik are distinct then 
*2{9, fi3,---Jik) = Mg, fi3:---Jik)- Therefore, 

< (/i2, 02(5, fis,---, hk)) - {9, Ol(/i2> ■■■Jik)) <V- 

Now the assumption that 1, . . . , 1, fJ^i^^i , ■ ■ ■ , l^i,,) is bounded above by 2 whenever j > 2 and 
ij+i,...,ik are distinct implies that Oj{g, g, . . . , g, fi.^-^, . . . , fij and *j{g, 
equal under these circumstances. From this it is a small exercise to show that 

k 

(/i2!°2(5',/i3'- • -Jik)) - {9,°k{9,9,-- ■ ,9)) = "^{fij - 9,°ji9,9,-- ■ ,9Jij+i,- ■ -Jik))- 

Therefore, for ^2, • ' ' > ^fc distinct, 

{9,°i{fi'2,---Jik)) - {9,°k{9,9,--- ,9)) (1) 

differs from 

k 

^{fij - 9,°j{9,9,--- ,9Jij+i,---Jik)) (2) 

J=2 

by at most t]. 

The probability that ii, . . . ,ik are not distinct is at most ^ rj/Ak, and if they are not distinct 

then the difference between (1) and (2) is certainly no more than (since all capped convolutions 
take values in [0,2] and \\fij\\i < < 2). Therefore, taking the expectation over all 

(not necessarily distinct) and noting that {g, Ofc(^, g, ■ ■ ■ , g)) = {g, *i {g,g, . . . , g)), we find that 

^ii,...,ik {fii ,°iifi2,---Jik)) - (5, *i {9,9,---,g)) 

differs from 

fc 

J2if -9, ^ij+i,...,ik °j {9,9,---, 9, 4+1 ,---Jik)) 

3=1 

by at most 2r], as claimed. □ 

To state our counting lemma, we need to define the norm that we shall actually use. 

Definition 3.6. Let X be a finite set and let S be a homogeneous collection of ordered subsets of X 
of size k. Let fi = (/xi, . . . , fXm) o,nd v = (z/i, . . . , f^) be two sequences of measures on X. A {/j,, v)- 
basic anti-uniform function is a function of the form Oj^g^^, . . . ,gi-_^, fi^^^, . . . , fi^), where I < j < k, 

ii, . . . ,ik are distinct and < (7^^ < and < /j^ < /Uj^ for every h between 1 and k. Let 

be the set of all {fi,h')-basic anti-uniform functions and define the norm by taking \\h\\^^i, to be 

max{\{h,(p)\ : (p G $^,^}. 
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The phrase "basic anti-uniform function" is borrowed from Green and Tao, since our basic anti- 
uniform functions are closely related to functions of the same name that appear in their paper [22]. 

Our counting lemma is now as follows. It says that if ||/ — ^H^,! is small, then the "sparse" 
expression given by ^i-i^,...,ii^^{i^...,m}{fin°i{fi2T ■ ■ ■> fik)) '^^ approximated by the "dense" expression 
{g, *i{g,g, . . . ,g))- This lemma modifies Lemma 3.3 in two ways: it splits / up into /i + • • • + /m and 
it caps all the convolutions that appear when one expands out the expression (/, . . . , /)) in terms 
of the fi. 

Corollary 3.7. Suppose that the assumptions of Lemma 3.5 hold, and that \ {f — g,<f>)\ < ij/k for every 
basic anti-uniform function (p & Then Kxg{x) > Kxf{x) — rj/k, and 

|lEji,...,i^e{i,...,„i}(/ii,oi(/j2,...,/jJ) - {g,*i{g,g,... ,g))\ < iri. 

Proof. The function Ojt(l, !,...,!) is a basic anti-uniform function, and it takes the constant value 1. 
Since Exh{x) = {h, 1) for any function h, this implies the first assertion. 

Now the probability that ii, . . . ,ik are distinct is again at most rj/Ak, and if they are not distinct 
we at least know that \{f — g, oj {g,g,. . . , g, fi.^-^ , . . . , /^^ )) | < 4. Therefore, our hypothesis also implies 
that 

k 

Y,\{f-9, °j {9,g,---, 9, fi,+i , • • • , /i J) I < k{r]/k) + 4k{r,/4k) = 2r). 

Combining this with Lemma 3.5, we obtain the result. □ 

In order to prove analogues of structural results such as the Simonovits stability theorem and the 
hypergraph removal lemma we shall need to preserve slightly more information when we replace our 
sparsely supported function / by a densely supported function g. For example, to prove the stability 
theorem, wc proceed as follows. Given a subgraph A of the random graph Gn,p, we create a weighted 
subgraph B of Kn that contains the same number of copies of H, up to normalization. However, to 
make the proof work, we also need the edge-density of B within any large vertex set to correspond 
to the edge-density of A within that set. Suppose that we have this property as well and that A is 
H-fiee. Then B has very few copies of H. A robust version of the stability theorem then tells us that 
B may be made {x{H) — l)-partite by removing a small number of edges (or rather a small weight of 
weighted edges). Let us look at the resulting weighted graph B' . It consists of x{H) — 1 vertex sets, all 
of which have zero weight inside. Therefore, in each of these sets had only a small weight to begin 
with. Since all "local densities" of A reflect those of B, these vertex sets contain only a very small 
proportion of the possible edges in A as well. Removing these edges makes A into a {x{H) — l)-partite 
graph and we are done. 

How do we ensure that local densities are preserved? All we have to do is enrich our set of basic 
anti-uniform functions by adding an appropriate set of functions that will allow us to transfer local 
densities from the sparse structure to the dense one. For example, in the case above we need to 
know that A and B have roughly the same inner product (when appropriately weighted) with the 
characteristic measure of the complete graph on any large set V of vertices. We therefore add these 
characteristic measures to our stock of basic anti-uniform functions. For other applications, we need to 
maintain more intricate local density conditions. However, as we shall see, as long as the corresponding 
set of additional functions is sufficiently small, this does not pose a problem. 
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4 A conditional proof of the main theorems 



In this section, we shall collect together the results of Sections 2 and 3 in order to make clear what is 
left to prove. We start with a simple and general lemma about duality in normed spaces. 

Lemma 4.1. Let ^ be a bounded set of real-valued functions defined on a finite set X such that the 
linear span of ^ is M.-^ . Let a norm on M.-^ be defined by \\f\\ = max{|(/, (f))\ : (j) G $}. Let \\.\\* be the 
dual norm. Then \\ip\\* < ^ if and only if ip belongs to the closed convex hull of (— 

Proof If V' = Ei ^i4'i with (pi e <^U (-$), Xi> for each i and Ei = 1> and if ||/|| < 1, then 
\{f,ip)\ < J2i^i\{f ^4^1)1 ^ 1- The same is then true if i/j belongs to the closure of the convex hull of 
$U (-$). 

If Ip does not belong to this closed convex hull, then by the Hahn-Banach theorem there must be 
a function / such that \{f,(l))\ < 1 for every ^ G $ and (/, "0) > 1- The first condition tells us that 
11/11 < 1, so the second implies that ||^||* > 1. □ 

So we already know a great deal about functions (p with bounded dual norm. Recall, however, 
that we must consider positive parts of such functions: we would like to show that {fi — u, (p^) is small 
whenever ||^||* is of reasonable size. We need the following extra lemma to gain some control over 
these. 

Lemma 4.2. Let ^ be a set of functions that take values in [—2,2] and let e > 0. Then there exist 
constants d and M , depending on e only, such that for every function ip in the convex hull of ^, there 
is a function u that belongs to M times the convex hull of all products ztcpi . . .cpj with j < d and 
(pi, ■ ■ ■ ,(pj G such that 11^"+ — ^\\oo < 

Proof. We start with the well-known fact that continuous functions on closed bounded intervals can 
be uniformly approximated by polynomials. Therefore, if K{x) is the function defined on [—2, 2] that 
takes the value if x < and a; if a; > 0, then there is a polynomial P such that \P{x) — K{x)\ < e for 
every x G [—2, 2]. It follows that if ^ is a function that takes values in [—2, 2], then ||-P(V') — V'+lloo ^ 

Let us apply this observation in the case where ip is a convex combination Ej ^i't'i of functions 
(Pi G If P{t) = Ej=i ajP, then 

d 
j=l 

But Ell ij ■ ■ ■ \j = 1 for every j, so this proves that we can take M to be Ej=i This bound 
and the degree d depend on e only, as claimed. □ 

Similarly, for colouring problems, where we need to deal with the function (maxi<j<r (pi) + , we have 
the following lemma. The proof is very similar to that of Lemma 4.2, though we must replace the 
function K{x) that has to be approximated with the function K{xi, . . . , Xr) = max{0, xi, . . . , Xr} and 
apply a multivariate version of the uniform approximation theorem inside the set [—2, 2Y (though the 
case we actually need follows easily from the one-dimensional theorem). 

Lemma 4.3. Let 'if be a set of functions that take values in [—2,2] and let e > 0. Then there exist 
constants d and M, depending on e only, such that for every set of functions ipi, . . . ,'ipr in the convex 
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hull of ^, there is a function uj that belongs to M times the convex hull of all products . . .(pj with 
j < d and (pi, . . . , (pj £ , such that ||(maxi<j<r '4'i)+ — ^\\oo < e- 

We shall split the rest of the proof of our main result up as follows. First, we shall state a set of 
assumptions about the set S of ordered subsets of X. Then we shall show how the transference results 
we are aiming for follow from these assumptions. Then over the next few sections we shall show how 
to prove these assumptions for a large class of sets S. 

The reason for doing things this way is twofold. First, it splits the proof into a deterministic part 
(the part we do now) and a probabilistic part (verifying the assumptions). Secondly, it splits the 
proof into a part that is completely general (again, the part we do now) and a part that depends 
more on the specific set S. Having said that, when it comes to verifying the assumptions, we do not 
do so for individual sets S. Rather, we identify two broad classes of set S that between them cover 
all the problems that have traditionally interested people. This second shift, from the general to the 
particular, will not be necessary until Section 7. For now, the argument remains quite general. 

Our main theorem concerns a random subset U = Xp with p > po, where po will in applications 
be within a constant of the smallest it can possibly be. As we have already seen, we shall actually 
state a result about a sequence of m random sets Ui,.. . , Um- Suppose that we have chosen them, and 
that their associated measures are /ii, . . . , /i^. Let fi = m~^(^i + • • • + fim)- We shall be particularly 
interested in the following four properties that such a sequence of sets may have. 

Four key properties. 

0. = 1 + o(l), for each i. 

1. II *j (pi-^,. . . . . - Oj(^jj, . . . . . ,^ij||i <r] whenever ii, . . . 
ij+i, ■ ■ ■ ,ik are distinct integers between 1 and m and I < j < k. 

2. II *j (1, 1, . . . , 1, fJ'ij^i, • • • , /^ifc)||oo < 2 for every j > 2 whenever ij+i, ■ ■ ■ ,ik o,i~e distinct integers 
between 1 and m. 

3. I (/X — 1,01 < A whenever ^ is a product of at most d basic anti-uniform functions from 

For the rest of this section we shall assume that S and po are such that these four properties hold 
with high probability. That this is so for property (which depends on pQ but not on S) follows easily 

from Chernoff's inequality. Proving that it is also true for properties 1, 2 and 3 will be the main task 
that remains after this section. Writing ri(l) to stand for a sufficiently large constant, our assumption 
is as follows. 

Main assumption. Let positive integers m and d and positive constants rj, A be given. Let Ui, . . . , Um 
be independent random subsets of X, each distributed as Xp. Then properties 0-3 hold with probability 
1 — n~^^^^ whenever p > po- 

Sometimes we shall want to focus on just one property. When that is the case, we shall refer to 
the assumption that property j holds with probability 1 — n~^^^^ as assumption j. 

Before we show how the main assumption allows us to deduce a sparse random version of a density 
theorem from the density theorem itself, we need a simple lemma showing that any density theorem 
implies an equivalent functional formulation. 
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Lemma 4.4. Let k be an integer and p, /3,e > be real numbers. Let X be a sufficiently large finite 
set and let S be a collection of ordered subsets of X with no repeated elements, each of size k. Suppose 
that for every subset B of X of size at least p\X\ there are at least (3\S\ elements (si, . . . , Sk) of S such 
that Si E B for each i. Let g be a function on X such that < g < 1 and \\g\\i > p + €■ Then 

^s&sgisi) ■ --gisk) > /3 - e. 

Proof. Let us choose a subset B oi X randomly by choosing each x X with probabiUty g{x), with 
the choices independent. The expected number of elements of B is g{x) > (p+e)|X| and therefore, 
by applying standard large deviation inequalities, one may show that if \X\ is sufficiently large the 
probability that \B\ < p\X\ is at most e. Therefore, with probability at least 1 — e there are at least 
f5\S\ elements s of 5 such that Sj G B for every i. It follows that the expected number of such sequences 
is at least /3|S'|(1 — e) > (/? — f')\S\. But each sequence s has a probability g{si) . . . g{sk) of belonging 
to B, so the expected number is also "^g^g ff('Si) • • • g{sk), which proves the lemma. □ 

Note that the converse to the above result is trivial (and does not need an extra e), since if -B is a 
set of density p, then the characteristic function of B has Li norm p. 

We remark here that the condition that no sequence in S should have repeated elements is not a 
serious restriction. For one thing, all it typically does is rule out degenerate cases (such as arithmetic 
progressions with common difference zero) that do not interest us. Secondly, these degenerate cases 
tend to be sufficiently infrequent that including them would have only a very small effect on the 
constants. (The reason we did not allow them was that it made the proof neater.) 

With this in hand, we are now ready, conditional on the main assumption, to prove that a transfer- 
ence principle holds for density theorems. We remark that in the proof we do not use the full strength 
of assumption 1, since we use only the result for the 1-convolutions. The more general statement 
about j-convolutions is used later, when we shall show that assumption 1 implies assumption 3. 

Theorem 4.5. Let k be a positive integer and let p, f3,e > be real numbers. Let X be a sufficiently 
large finite set and let S be a homogeneous collection of ordered subsets of X with no repeated elements, 
each of size k. Suppose that for every subset B of X of size at least p\X\ there are at least I3\S\ elements 
(si, . . . , Sfc) of S such that Si E B for each i. Then there are positive constants rj and A and positive 
integers d and m with the following property. 

Let po be such that the main assumption holds for the pair {S,po) and the constants ri,X,d and 
m. Let p > po and let Ui, . . . ,Um be independent random subsets of X, with each elem,ent of X 
belonging to each Ui with probability p and with all choices independent. Let the associated measures 
ofUi,..., Um be pi, . . . , p,jn and let p = m~^{pi + • • • + Pm)- Then with probability 1 — n"^^^^ we have 
the following sparse density theorem: 

Esg5/(si) . . . f{sk) > P — e whenever < f < p and Kxf{x) > p + e. 

Proof. To begin, we apply Lemma 4.4 with | to conclude that if g is any function on X with < g < 1 
and ll^lli > /9+ |, then, for \X\ sufficiently large, 

lEseSt/l'Si) • --gisk) >P-^- 

For each function h, let \\h\\ be defined to be the maximum of \{h,(f))\ over all basic anti-uniform 
functions cj) G ^n,!- Let V = Jq- We claim that, given / with < f < p., there exists a g with 
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< 5 < 1 such that ||(1 + f - g\\ < r]/k. Equivalently, this shows that |((1 + D^V-ff, 0)1 < v/k 
for every G ^n,i- We will prove this claim in a moment. However, let us first note that it is a 
sufficient condition to imply that 

Esesf{si)...f{sk)>(3-e 

whenever < f < fJ. and E,xf{x) > p + e. Let m = 20 /rj and write (1 + f as m^^{fi + • • • + fm) 
with < fi < Pi- Corollary 3.7, together with properties 1 and 2, then implies that Kxg{x) > 
(1 + |)-^E^/(x) - rj/k and that 

\^ii,...,ike{i,...,m} {fh ,°i(.fi2,---Jik)) - {9, *i (.9,9,---, 9)) I < 4?7- 
Since rj/k < e/8, (1 + f > 1 - | and 1 + o(l) > Ea;/(x) > p + e, 

E^gix) > (1 + ^y\^fix) - ri/k > p + e - ^ - ^ - o{l) > p+ ^, 

for \X\ sufficiently large, so our assumption about g implies that {g, *i(g,g, . . . ,g)) > /3 — §. Since in 
addition Bry < e, we can deduce the inequality ^ii,...,ik{fin°i{fi2T--ifik)) ^ P ~ which, since the 
capped convolution is smaller than the standard convolution, implies that 

E,e5/(si) . . . fisk) = if, /,...,/))> ^iu...,iMh^°iifi2, fi,)) > /3 - e. 

It remains to prove that for any / with < f < p, there exists a g with < g < 1 such that 
11(1 + |)~^/ — g\\ < r]/k. An application of Lemma 2.5 tells us that if {p — l,i>+) < | for every 
function with HV'II* ^ kri~^, then this will indeed be the case. Now let us try to find a sufficient 
condition for this. First, if 1|V'||* < kr]^^ , then Lemma 4.1 implies that tp is contained in krj^^ times 
the convex hull of $ U {— where $ is the set of all basic anti-uniform functions. Since functions in 
$ U {—^} take values in [—2, 2], we can apply Lemma 4.2 to find constants d and M and a function 
cj that can be written as M times a convex combination of products of at most d functions from 
$ U {— ^*}, such that — uj\\oo < e/20. Hence, for such an uj, 

{p - 1, V+ - a;) < \\p - l||i||^+ - a;||oo < (2 + o(l))^ < 

for \X\ sufficiently large. From this it follows that if \{p — < e/SM whenever ^ is a product of at 
most d functions from $ U {— $}, then 

{p - 1, V+) = {p-l,uj) + {p- 1, iP+-uj)< e/8 + e/8 = e/4. 

Therefore, applying property 3 with d and A = e/8M completes the proof. □ 

We would also like to prove a corresponding theorem for colouring problems. Again, we will need 
a lemma saying that colouring theorems always have a functional reformulation. 

Lemma 4.6. Let k, r be positive integers and let /S > be a real number. Let X be a sufficiently large 
finite set and let S be a collection of ordered subsets of X with no repeated elements, each of size k. 
Suppose that for every r-colouring of X there are at least /3\S\ elements {si, . . . , s^) of S such that 
each Si has the same colour. Let gi,. . . ,gr be functions from X to [0, 1] such that gi + ■ ■ ■ + gr = 1 ■ 
Then 

r 

^seS XI 9i{si) ■ ■ ■ 9i{sk) > 
1=1 



21 



Proof. Define a random r-colouring of X as follows. For each x € X, let x have colour i with probabil- 
ity Qiix), and let the colours be chosen independently. By hypothesis, the number of monochromatic 
sequences is at least P\S\, regardless of what the colouring is. But the expected number of monochro- 
matic sequences is ^g^s SI=i 9i{^i) ' " 9i{sk), so the lemma is proved. □ 

We actually need a slightly stronger conclusion than the one we have just obtained. However, if S 
is homogeneous then it is an easy matter to strengthen the above result to what we need. 

Lemma 4.7. Let k,r be positive integers and let /3 > be a real number. Let X be a sufficiently large 
finite set and let S be a homogeneous collection of ordered subsets of X with no repeated elements, 
each of .size k. Suppose that for every r-colouring of X there are at least P\S\ elements (si, . . . ,Sk) 
of S such that each Si has the same colour. Then there exists 5 > with the following property. If 

gi,...,gr are any r functions from X to [0,1] such that gi{x) -\ \-gr{x) > 1/2 for at least {1 — 5)\X\ 

values of X, then 

r 

Proof. Let Y be the set of x such that 51 (x) H h gr{x) < 1/2. Then we can find functions hi,. . . ,hr 

from X to [0, 1] such that hi + ■ ■ ■ + hr = 1 and hi{x) < 2gi{x) for every x e X \ Y. By the previous 
lemma, we know that 

r 

^s€S X] ■ ■ ■ ^i(^fe) ^ Z^- 

i=l 

Let T be the set of sequences s € S such that Sj G y for at least one i. Since S is homogeneous, for 
each i the set of s such that Si €Y has size jSHyl/lXl < 6\S\. Therefore, |r| < /c^lS*!. It follows that 

r r 
seS i=l seS\T i=l 

r 

> 2-''YYhi{si)---hi{sk)-\T\ 

ses 1=1 

> {2-''l3-kd)\S\. 



Thus, the lemma is proved if we take 5 = 2-('=+i)/3//c. □ 

We now prove our main transference principle for colouring theorems. The proof is similar to that 
of Theorem 4.5 and reduces to the same three conditions, but we include the proof for completeness. 

Theorem 4.8. Let k, r be positive integers and /S > be a real number. Let X be a sufficiently large 
finite set and let S be a homogeneous collection of ordered subsets of X with no repeated elements, 
each of size k. Suppose that for every r-colouring of X there are at least I3\S\ elements (si, . . . , Sfc) 
of S such that each Si has the same colour. Then there are positive constants r] and A and positive 
integers d and m with the following property. 

Let pq be such that the main assumption holds for the pair {S,pq) and the constants ri,X,d and 
m. Let p > po and let Ui,...,Um be independent random subsets of X, with each element of X 
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belonging to each Ui with probability p and with all choices independent. Let the associated measures 
ofUi,..., Um be III, ... , Hm and let /j, = m~^(/Lii + • • • + Hm)- Then with probability 1 — n~^^^^ we have 
the following sparse colouring theorem: 

^s€S (ELi fiisi) ■ ■ ■ fiisk)) > 2-M/3 whenever Q < h < pi and ELi fi = 1^- 

Proof. An application of Lemmas 4.6 and 4.7 tells us that there exists 5 > with the following 
property, li gi, . . . ,gr are any r functions from X to [0, 1] such that gi{x) + ■ ■ ■ + gr{x) > 1/2 for at 
least (1 — (5)|X| values of x, then, for |X| sufficiently large, 

r 
i=l 

Again we define the norm ||.|| by taking \\h\\ to be the maximum of \{h, (f))\ over all basic anti-uniform 
functions (p € Let r/ be such that Sijr < min((5, 2~(^"*"^)/?). We claim that, given functions 

fi,. ■ ■ , fr with < fi < and Yll=i fi ~ there are functions g^ such that < gi < 1, gi + - ■ ■+gr < 1 
and 11(1 + ^)~^fi — gi\\ < ri/k. Equivalently, this means that |((1 + — gi,(l))\ < v/k for every i 

and every G ^n,!- We will return to the proof of this statement. For now, let us show that it implies 

(j2 • • • fi('k)^ ^ 2-('=+2)^. 

Let m = 2k^ /rj and write (1 + as m^^{fi^i + • • • + /i,m) with < fij < fij. Corollary 3.7, 

together with properties 1 and 2, then implies that E^giix) > (1 + j)~^E,xfi{x) — rj/k and that 

l%i,-,jfce{i,...,m}(/Mi,oi(/M2> • • -Jidk)) - {9i,*i{9i,9i, ...,9i))\< 4??. 

Suppose that there were at least S\X\ values of x for which J2i=i9i{^) < h- Then this would imply 
that 



i=l 



But E^gi{x) > (1 + 1) ^Kxfiix) — rj/k. Therefore, adding over all i, we have, since r] < 5/8r and 



(1 + >l-i, that 



J2^,ex9^{x)> (l + -) (l + o(l))-^>l--, 
i=l ^ ^ 



for \X\ sufficiently large, a contradiction. Our assumption about the gi therefore implies the inequality 
Yli=i{9ij *i{9ii9ii ■ ■ ■ j9i)) > 2~('^+^)/3. Since Srr/ < 2""('^+^)/3, we can deduce the inequality 

r 

(A.i,°i(A..,---,/mJ)>2-MA 

1=1 

which, since the capped convolution is smaller than the standard convolution, implies that 

r r 
i=l i=l 
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As in Theorem 4.5, we have proved our result conditional upon an assumption, this time that for any 
functions fi,...,fr with < fi < n and J2i=i fi = there are functions gi such that < gi < 1, 
gi + ■■■ + gr < 1 and 11(1 + — gi\\ < r]/k. An application of Lemma 2.6 tells us that if 

{/J, — 1, (maxi<j<r V'i)+) < <^/4 for every collection of functions V'i with < kr]~^, then this will 

indeed be the case. By Lemma 4.1, each tpi is contained in kr]~^ times the convex hull of $ U { — ^*}, 
where $ is the set of all basic anti-uniform functions. Since functions in <^> U take values in 

[—2,2], we can apply Lemma 4.3 to find constants d and M and a function u that can be written 
as M times a convex combination of products of at most d functions from $ U {— $}, such that 
||(maxi<j<f. — oj\\oo < (5/20. From this it follows that if \X\ is sufficiently large and — 1,0| < 
6/8M whenever ^ is a product of at most d functions from <I>U{ — <J>}, then (//—I, (maxi<j<r 4>i)+) < 5 /A. 
Therefore, applying property 3 with d and A = 6/8M proves the theorem. □ 

Finally, we would like to talk a little about structure theorems. To motivate the result that we 
are about to state, let us begin by giving a very brief sketch of how to prove a sparse version of the 
triangle-removal lemma. (For a precise statement, see Conjecture 1.7 in the introduction, and the 
discussion preceding it.) 

The dense version of the lemma states that if a dense graph has almost no triangles, then it is 
possible to remove a small number of edges in order to make it triangle free. To prove this, one first 
applies Szcmcredi's regularity lemma to the graph, and then removes all edges from pairs that are 
sparse or irregular. Because sparse pairs contain few edges, and very few pairs are irregular, not many 
edges are removed. If a triangle is left in the resulting graph, then each edge of the triangle belongs 
to a dense regular pair, and then a simple lemma can be used to show that there must be many 
triangles in the graph. Since we are assuming that there are very few triangles in the graph, this is a 
contradiction. 

The sparse version of the lemma states that essentially the same result holds in a sparse random 
graph, given natural interpretations of phrases such as "almost no triangles" . If a random graph with 
n vertices has edge probability p, then the expected number of (labelled) triangles is approximately 
p^n^, and the expected number of (labelled) edges is pn'^. Therefore, the obvious statement to try to 
prove, given a random graph Gq with edge probability p, is this: for every 5 > there exists e > such 
that if G is any subgraph of Gq that contains at most ep^n^ triangles, then it is possible to remove at 
most Spn^ edges from G and end up with no triangles. 

How might one prove such a statement? The obvious idea is to use the transference methods 
explained earlier to find a [0, l]-valued function g defined on pairs of vertices (which we can think of 
as a weighted directed graph) that has similar triangle-containing behaviour to G. For the sake of 
discussion, let us suppose that g is in fact the characteristic function of a graph (which by standard 
techniques we can ensure), and let us call that graph F. 

If r has similar behaviour to G, then F contains very few triangles, which is promising. So we 
apply the dense triangle-removal lemma in order to get rid of all triangles. But what does that tell 
us about G? The edges we removed from F did not belong to G. And in any case, how do we use an 
approximate statement (that G and F have similar triangle-containing behaviour) to obtain an exact 
conclusion (that G with a few edges removed has no triangles at all)? 

The answer is that we removed edges from F in "clumps" . That is, we took pairs {U, V) of vertex 
sets (given by cells of the Szemeredi partition) and removed all edges linking U to V. So the natural 
way of removing edges from G is to remove the same clumps that we removed from F. After that, the 
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idea is that if G contains a triangle then it belongs to clumps that were not removed, which means 
that r must contain a triple of dense regular clumps, and therefore many triangles, which implies that 
G must also contain many triangles, a contradiction. 

For this to work, it is vital that if a clump contains a very small proportion of the edges of F, then 
it should also contain a very small proportion of the edges of G. More generally, the density of G in 
a set of the form U x V should be about p times the density of F in the same set. Thus, wc need 
a result that allows us to approximate a function by one with a similar triangle count, but we also 
need the new function to have similar densities inside every set of the form U xV when U and V are 
reasonably large. 

In the case of hypergraphs, we need a similar but more complicated statement. The precise nature 
of the complexity is, rather surprisingly, not too important: the main point is that we shall need to 
approximate a function dominated by a sparse random measure by a bounded function that has a 
similar simplex count and similar densities inside all the sets from some set system that is not too 
large. 

In order to state the result precisely, we make the following definition. 

Definition 4.9. Suppose that we have a finite set X and suppose that ^/^^i is a collection of basic 

anti-uniform functions derived from a set of sequences S and a m,easure fi. Then, given a collection of 
subsets V of X , we define the set of basic anti-uniform functions $;i,i(V) to be U {xv '■ V G V}, 
where xv ^-s the characteristic function of the set V. 

We also need to modify the third of the key properties, so as to take account of the set system V. 
3'. — 1,^)1 < A whenever $, is a product of at most d basic anti-uniform functions from ^ i^^i{V) . 

We also need to modify our main assumption about (5,^0) to reflect this. (It now becomes an 
assumption about S, po and V.) 

Main assumption. Let positive integers m and d and positive constants rj, A be given. Let Ui, . . . , Um 
be independent random subsets of X, each distributed as Xp. Then properties 0, 1, 2 and 3' hold with 
probability 1 — n~^^^^ whenever p > pq. 

Our main abstract result regarding the transfer of structural theorems, proved conditional on the 
main assumption above, is the following. It says that not only do the functions / and g reflect one 
another in the sense that they have similar subset counts, but they may be chosen to have similar 
densities inside all the sets V from a collection V. The proof, which we omit, is essentially the same 
as that of Theorem 4.5: the only difference is that the norm is now defined in terms of $^^i(V), which 
gives us the extra information that |(/, xv) — {d, Xv)\ < 11/ — for every V €V and hence the extra 
conclusion at the end. 

Theorem 4.10. Let k be a positive integer and e > a constant. Let X be a finite set, S a homoge- 
neous collection of ordered subsets of X of size k and V a collection of subsets of X. Then there are 
positive constants r) and A and positive integers d and m with the following property. 

Let pq be such that the main assumption holds for the triple {S,po,V) and the constants r],X,d 
and m. Let Ui, . . . , Um be independent random sets Xp with p > po- Let their associated measures be 
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//I, . . . , and let fi = m ^{fii + • • • + fim)- Then with probability 1 — n ^^^^ the following theorem 
holds: whenever < f < fJ-, there exists g with < g < 1 such that 

^sesfisi) . . . f{sk) > Es^sgisi) ■ ■ ■ g{sk) - e 

and, for all V eV, 

\X\ 

l^xevfix) -'^xev9ix)\ < e—. 

We remark that the second part of the conclusion can be rewritten as 

\^xXv{x)f{x) - Kxxv{x)g{x)\ < e, 

which is precisely the statement that \{f,Xv) — {9:Xv)\ < £• If |^| < then the conclusion is 
vacuous, so it is a simple way of making a statement about large sets without explicitly insisting that 

they arc large. 

Note that for property 3' to have a chance of holding, we cannot have too many sets in the collection 
V. However, we can easily have enough sets for our purposes. For instance, the collection of pairs of 
vertex sets in a graph with n vertices has size 4"; this is far smaller than the number of graphs on n 
vertices, which is exponential in n^. More generally, an important role in the hyper graph regularity 
lemma is played by /c-uniform hypergraphs H formed as follows: take a (A; — l)-uniform hypergraph 
K and for each set E of size k, put E into H if and only if all its subsets of size k — 1 belong to K. 
Since there are far fewer (fc — l)-uniform hypergraphs than there are fc-uniform hypergraphs, we have 
no trouble applying our result. 

With these three theorems the section is finished. Our ultimate goal is to prove a probabilistic 
theorem (or rather, a collection of related probabilistic theorems). To do this, we have isolated certain 
key properties that imply our conclusions, and packaged them into the main assumptions above. In 
that way we have, up to now, avoided any probabilistic arguments. But obviously they are going to 
have to come in at some point, and that point is now. The task that remains is to verify that our 
main assumptions are indeed true for the particular pairs {S,po) or triples {S,po,V) that interest us. 

5 Small correlation with a fixed function 

In this section and the next, we shall make some progress on the task just mentioned by showing 
that assumption 3 (or assumption 3') follows from assumptions 1 and 2. It might seem more natural 
to prove assumptions 1 and 2 first; the reason we do not is twofold. First, we want to separate the 
argument cleanly into two parts, one that says that a random collection of sets has certain properties, 
and the other that says that those properties imply our main theorems. That is, we want to isolate 
as much as we can the probabilistic input into the result. (The deduction of assumptions 3 and 3' 
from assumptions 1 and 2 involves a few averaging arguments, but that does not count as probabilistic 
input since the averaging arguments are based on serious probabilistic assumptions.) The second 
reason is that we want to distinguish clearly between those parts of our argument that are completely 
general, and those parts that depend on specific characteristics of the set S of sequences. For now, 
our argument continues to be general. 

Assuming, then, that properties 1 and 2 hold with high probability, and given constants A and d, 
we would like to show that, with high probability, | — 1, 1 < A for every product ^ of at most d basic 
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anti-uniform functions. This is a somewhat complicated statement, since the set of basic anti-uniform 
functions depends on our random variable /n. In this section we prove a much easier result: we shall 
show that, for any fixed bounded function ^, |(|Lt — < A with high probability. 

To prove this, we shall need a standard probabilistic estimate, known as Bernstein's inequality, 
which allows one to bound the sum of independent and not necessarily identically distributed random 
variables. 

Lemma 5.1. Let Yi,Y2,--- ,Yn be independent random variables. Suppose that each Yi lies in the 
interval [0, M] . Let S = Yi + Y2 + ■ ■ ■ +Yn. Then 



P(|5-E(5)| >t)< exp. 



2(EV(iS) + f ) 



We are now ready to prove that {iJ, — l,jp) is bounded with high probability, for any fixed bounded 
function ip. 

Lemma 5.2. Let X be a finite set and let U = Xp. Let ji he the associated mea,sure of U . Then, for 
any constants C and A with C > X and any positive function tp with \\ip\\oo < C, 

P(|(/x-l,V')| > A) <e-^'fl^l/3t^'. 

Proof. For each x E X, ix{x)ij){x) is a random variable that takes values in the interval [0,p~^C]. The 
expectation of fi{x)ip{x) is ip{x), so the expectation of (/x— is 0. Also, the variance of fi{x)'ip{x) 

is at most K{fi(x)'^^p{x)'^) , which is ip{x)'^p~^, which is at most C'^p~^. 

Let S = Then the probability that {{n — > A equals the probability that 

\S — KS\ > X\X\. Therefore, by Bernstein's inequality, Lemma 5.1, 



P(|(/x-l,V)| > A) < exp 



-{X\X\ 



exp 



2 {C^p-^X\ + CXp-'^\X\/3) 
X^p\X\\ 



2(C2 + CA/3) j 
< exp{-A2p|X|/3C2}, 



where to prove the second inequality we used the assumption that C > A. □ 

Before we move on to the next section, it will be helpful to state Chernoff 's inequality, the standard 
estimate for the tails of the binomial distribution. As we have already noted, property of our main 
assumption is a straightforward consequence of this lemma. 

Lemma 5.3. Let p,6 > be real numbers and X a finite set. Then 

m\Xp\ -p\X\\ > Sp\X\) < 2e-'^'fW. 
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6 The set of basic anti-uniform functions has few extreme points 



In order to understand the rest of the deduction of assumption 3 from assumptions 1 and 2, it is 

important to understand the difficulty that we now have to overcome. The result of the previous 
section tells us that for any measure /x and any given function ^, — 1,01 bounded with high 
probability. We now need to show that this is the case for all functions ^ that are products of at most 
d basic anti-uniform functions. As we have already commented, this is tricky, because which functions 
are basic anti-uniform functions depends on /i. 

To get a clearer notion of the problem, let us look at a subcase of the general fact that we are trying 
to prove, by thinking how we might try to show that, with high probability, {/ii — 1, oi(/2, . . . , fk)) 
is small whenever < /j < //j for i = 2, 3, . . . , /c. That is, for the time being we shall concentrate on 
basic anti-uniform functions themselves rather than on products of such functions. 

A question that will obviously be important to us is the following: for how many choices of functions 
/2, . . . , /fc do wc need to establish that — 1, ox(/2, . . . , fk))\ is small? At first glance, the answer 
might seem to be infinitely many, but one quickly realizes that a small uniform perturbation to the 
functions /2, • • • , A does not make much difference to {ni — 1, oi(/2, . . . , fk)). So it will be enough to 
look at some kind of net of the functions. 

However, even this observation is not good enough, since the number of functions in a net will 
definitely be at least exponentially large in p\X\. Although the probability we calculated in the 
previous section is exponentially small in pl-'^l, the constant is small, whereas the constant involved 
in the size of a net will not be small. So it looks as though there are too many events to consider. 

It is clear that the only way round this problem is to prove that the set of basic anti-uniform 
functions oi{f2^ ■ ■ ■ , fk) is somehow smaller than expected. And once one thinks about this for a bit, 
one realizes that this may well be the case. So far, we have noted that oi(/2, . . . , fk) is not much 
affected by small uniform perturbations to the functions fi. However, an important theme in additive 
combinatorics is that convolutions tend to be robust under a much larger class of perturbations: 
roughly speaking, a "quasirandom" perturbation of one of the fi is likely to have little effect on 

Ol(/2, • • • , AO- 
It is not immediately obvious how to turn this vague idea into a precise one, so for a moment let 
us think more abstractly. We have a class T of functions, and a function v, and we would like to 
prove that {u, cf)) is small for every G F. To do this, we would like to identify a much smaller class 
of functions A such that if (i/, ■0) is small for every tp E A then {v, cf)) is small for every (f) G T. The 
following very simple lemma tells us a sufficient (and also in fact necessary) condition on A for us to 
be able to make this deduction. 

Lemma 6.1. Let T and A be two closed sets of functions from X to M and suppose that both are 
centrally symmetric. Then the following two statements are equivalent: 

(i) For every function v, max{|(i/, 0)| : ^ € T} < max{|(z/, : i> G A}. 

(a) r is contained in the convex hull of A. 

Proof. The statement we shall use is just the easy direction of this equivalence, which is that (ii) 
implies (i). To see this, let (j) eV. Then we can write as a convex combination of elements 

of A, and that implies that i?!>)| < 0i)|. If < * for every ip e A, then this is at 

most t, which proves (i), since v and (f) were arbitrary. 
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Now let us suppose that F is not contained in the convex hull of A, and let cj) be an element of F 
that does not belong to this convex hull. Then the Hahn-Banach theorem and the fact that A is closed 
and centrally symmetric guarantee the existence of a function v such that (z/, (j)) > 1, but V')! < 1 
for every -0 G A, which contradicts (i). □ 

The reason Lemma 6.1 is useful is that it gives us a strategy for proving that |(/U — is small 
for all products of at most d basic anti-uniform functions: try to show that these functions belong to 
the convex hull of a much smaller set. In fact, this is not quite what we shall do. Rather, we shall 
show that every ^ can be approximated by an element of the convex hull of a much smaller set. To 
prepare for the more elaborate statement we shall use, we need another easy lemma. 

The statement of the lemma is not quite what one might expect. The reason for this is that the 
simplest notion of approximation, namely uniform approximation, is too much for us to hope to attain. 
Instead, we go for a kind of weighted uniform approximation, where we allow the functions to differ 
quite a lot, but only in a few specified places. 

Lemma 6.2. Let H he a non-negative function defined on X such that \\H\\i < e and ||-ff|Ioo R- Let 
U = Xp and let be the associated measure ofU. Then, with probability at least 1 — cxp(— e^p|X|/3i2^), 
we have the estimate \{iJ, — — {n — l,ip)\ < 3e for every pair of functions cp and ip such that 
\<f>-i^\<H. 

Proof. The fact that \\H\\i < e implies that |(l,</») — (1,V')| < e as well. Also, — ip)] < 
Therefore, it remains to estimate the probability that {fi, H) > 2e. Lemma 5.2 with A = e and C = R 
implies that the probability that — 1,H) > e is smaller than cxp(—e^p\X\/3R'^). Therefore, with 
probability at least 1 — exp(— e^p|X|/3i?^), {n,H) < 2e. The result follows. □ 

If we use Lemma 6.1 and Lemma 6.2 in combination, then we can show the following result. 

Corollary 6.3. Let H be a non-negative function defined on X, and suppose that \\H\\\ < e and 
ll-f^lloo < R- Let U = Xp and let fj, be the associated measure of U. Let F and A be two sets of 
functions, and suppose that for every </> G F there exists ijj in the convex hull of A such that {(p—ipl < H- 
Then 

max{|(^ - 1,(/>)| : (/) € F} < max{|(/i - l,^p')\ : ^ip' G A} + 3e 
with probability at least 1 — exp(— e^p|X|/3ii^). 

Proof. By Lemma 6.2, the probability is at least 1 — exp(— e^p|X|/3ii^) that \{fl — l,(f)) — {fl — l,^p)\ < 3e 
whenever |^ — ■0| < H. By the easy direction of Lemma 6.1, \{n — < max{|(// — 1, ^')| : ^' G A} 

for every i/j in the convex hull of A. This proves the result. □ 

How do we define an appropriate set of functions A? A simple observation gets us close to the set 
we need, but we shall need to make a definition before we can explain it. 

Definition 6.4. Let 0<q<p<l, U = Xp and let V = Uq/p. Let ji be the associated measure of 
U and let v be the associated measure of V considered as set distributed as Xg . Let f be a function 
with < f < fx. Then the normalized restriction f^, of f to V is the function defined by taking 
fi/{x) = {p/q)f{x) if X eV and otherwise. 

The normalization is chosen to give us the following easy lemma. Note that the expectation below 
is a "probabilistic" expectation rather than a mere average over a finite set. 
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Lemma 6.5. Let U = Xp be a set with associated measure ji and let V = Uq/p be a random subset of 
U . Then, for any function < f < f = Kyfv 

Proof. For each x & U we have 

= ip/q)fix)¥[x eV]= fix), 
and for each x we have f{x) = ¥.vfv{x) = 0. □ 

This lemma expresses / as a convex combination of normahzed restrictions, which is potentially 
useful to us, since if q/p is a small constant, then a typical contribution to this convex combination 
comes from a restriction to a set that is quite a lot smaller than U. That will allow us to find a small 
net of the set of all possible restrictions whose convex hull can be used to approximate the set of all 
possible functions / with < / < /i. 

Furthermore, we can use Lemma 6.5 to write convolutions and products of convolutions as convex 
combinations as well, as the next lemma easily implies. The lemma itself is very easy as well so we 
omit the proof. 

Lemma 6.6. For 1 < i < m let fi be a fixed function and let gi be a random function such that 
fi = Egij. Let . . . , fm) be a multilinear form in the functions fi, ■ ■ ■ , fm- Then 

K{fl,---,fm) = EAv(c/i, . . . ,5m). 

The rough idea, and the third main idea of the paper, is to rewrite every product of convolutions 
as an average of products of basic anti-uniform functions built out of normalized restrictions. Since 
there are "fewer" of these, we will have fewer events that need to hold. We must, however, be careful 
when we apply this idea. For a start, we cannot afford to let q become too small. If q is too small, 
then, given associated measures i^i , . . . , i/^ of sets distributed as Xq , we can no longer guarantee that 
the convolutions *i{i'2, • • • , J^s) arc sufficiently well-behaved to have small inner product with fi — 1 for 
almost every /x . Even when we choose q to be large enough, there will still be certain rogue choices of 
sets. However, for q sufficiently large, we can take care of these rogue sets by averaging and showing 
that they make a small contribution. Thus, there is a delicate balance involved: q has to be small 
enough to give rise to a small class of functions, but large enough for these functions to have the 
properties required of them. 

Another problem arises from the form of the basic anti-uniform functions. Recall that our starting 
point is a collection of (randomly chosen) sets Ui,... , Um with associated measures Hi,... ,/Xm- The 
basic anti-uniform functions we want to approximate are of the form Oj(g, . . . ,g, fj+i, ■ ■ ■ , fk), where 
< g < 1 and < //^ < ;Uj^ for some sequence (ii, . . . , i^) of integers between 1 and m. Therefore, we 
must approximate capped convolutions of functions some of which are bounded above by 1 and some 
of which are bounded above by associated measures of sparse random sets. This creates a difficulty 
for us. It is still true that if F = is a random set with associated measure u, then g = Eygi,, but 
if we exploit that fact directly, then the number of sets V that we have to consider is on the order 
of This is much larger than exp(cp|X|) for any constant c and therefore too large to use in a 

probabilistic estimate given by a simple union bound. To get round this problem we shall find a much 
smaller set V such that g = Ey^vdu- 

We shall need the following piece of notation to do this. Suppose that the elements of the set X 
are ordered cis J ■ ■ * J 3Cfi Sciy. Then, given a subset V = {xj^ , . . . , } of X and an integer a between 
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and n — 1, we define the set y + a to be the set formed by translating the indices by a. That is, 
V + a = {xj^+a, . . . ,Xji^a} where the sums are taken modulo n. (This "translation" operation has no 
mathematical significance: it just turns out to be a convenient way of defining a small collection of 
sets.) 

Let us write u + a for the characteristic measure of V + a. Write g^+a for the function given by 
jyjg{x) if x G V + a and otherwise. A proof almost identical to that of Lemma 6.5 implies that 
g = Eafff+a for 3.ny function g, and in particular for any function g such that < g < 1. From this 
and Lemma 6.5 itself it follows that if {Wi, . . . ,Wj) are any subsets of X, and oji is the characteristic 
measure of Wj, then 

• • • i9ifij+ii • • • 1 fik) — ^ai,...,aj^Vj+-i,....V^: *j (ffwi+ai; • • • igu}j+aj) (/ij+i )f j+i ; • • • ; {fik)^k)i 

where for h > j the set Vh is distributed as {Ui^)qip. There is one practical caveat, in that this 
identity holds when z/ is a characteristic measure, but it is more natural for us to deal with associated 
measures. However, if we assume that V was chosen with probability q and \V\ = (1 + o(l))g|X|, then 
the distinction vanishes and the identity above holds (up to a o(l) term) with associated measures 
rather than characteristic ones 

This observation is encouraging, because it represents the convolution *j{g, ■ ■ ■ ,g, fij+i i ■ ■ ■ i fik) ^ 
an average of convolutions from a small class of functions. However, it certainly does not solve our 
problems completely, since we need a statement about capped convolutions. Of course, it would be 
very surprising if it did solve our problems, since so far we have not said anything about the size of 
q and the sets Wi, . . . ,Wj. In order to transfer the trivial observation above from convolutions to 
capped convolutions, we shall need q to be sufficiently large and the Wi to be "sufficiently random" . 

6.1 Sufficient randomness 

Let us define what we mean by "sufficiently random" and then show that what we need can be obtained 
from our assumption 1. The next definition highlights the property that we want to get out of the 
sufficient randomness: that capped convolutions should be pretty similar to actual convolutions. 

Definition 6.7. We shall say that a sequence of measures ui, . . . , I'j-i, Vj+i, ■ ■ ■ ■,1'k is (ry, j)-good if 

II *j (z^i, . . . , i^j+i, . . . , ffe) - Oj{iyi, Uj^i, Uj+i, ffe)||i < 77. 

The randomness property we need of our sets Wi is roughly speaking that almost all sequences of 
sets that appear in the averages wc consider arc (ry, j)-good for every j. That will allow us to prove a 
statement about capped convolutions, because almost all the convolutions that appear in the average 
in the observation above can then be approximated by the capped counterparts. Here is the formal 
definition. 

Definition 6.8. Let r] > be a real number, let < q < p < 1 and let Ui, . . . , Um be subsets of X 
chosen independently with probability p. Let wi, . . . , be the associated measures of sets Wi, . . . , W^ 
chosen independently with probability q. We say that Ui, . . . Um and u\,...,ujk are sufficiently random 
if = (1 + o(l))9|^l for every 1 < i < k and, for every j between 1 and k and every sequence 
{ij+i, . . . ,ik) of distinct integers between 1 and m, the following statement holds. 
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• Let a sequence (ai, . . . , Oj-i, Vij+i i ■ ■ ■ , Vik) chosen randomly such that ah G '^n for every h < j 
and Vi^ = {Uif^)q/p for every h > j, and for each h > j let he the associated measure of Vi^. 
Then the probability that the {k — l)-tuple (ui + ai, . . . ,ujj-i + aj-i, fj^+i, • • • , i^i^) is {r],j)'good 
is at least 1 — o{\X\~'^). 

Strictly speaking, the definition of sufficient randomness depends on the parameters rj, p and q, 
but these will be clear from the context. 

Our next main lemma will be that if assumption 1 holds, then the probability that Ui, . . . , Um, 
uji, . . . ,ujk are sufficiently random, if we choose them independently at random in a suitable way, is 

Lemma 6.9. Suppose that assumption 1 holds and let! > p > q > Po- Let Wi, . . . , Wk he independent 
random subsets of X with each Wi = Xg, and let Ui, . . . , Um be independent random sets with each 
Ui = Xp. Let the associated measures of Wi, . . . , be coi, . . . ,ujk, respectively. Then the probability 
that Ui, . . . , Ujn,ijJi, . . . ,ujk are sufficiently random is 1 — \X\~^^^\ 

Proof. Fix some j and some sequence ij+i, . . . , ife of integers between 1 and m. Consider the following 
way of choosing k — 1 random sets. First we choose Wi, . . . , Wk and Ui,. . . , Um as in the statement of 

the lemma. Chernoff's inequality easily implies that with probability 1 — l^l^^^^-* we have \Wi\ = (1 + 
o(l))q|X| for all 1 < i < /c. Next, we choose ai, . . . , Oj-i, ^j+u • • • i randomly and independently 
such that ah € Z„ for every h < j and V^^ = for every h > j. Then the sets Wi+ai, . . . , Wj-i+ 

Oj_i, Vi^+i, . . . , Vij, are independent random sets, each distributed as Xg. 

Let their associated measures be wi + ai, . . . , Wj-i + aj-i, ^ij+i, ■ ■ ■ ^'^■ik- Then assumption 1 tells 
us that this sequence of measures is (77,j)-good with probability 1 — jXl^^^^^^. Since the number of 

choices of j and ij+i, . . . ,ik is bounded, the same is true for all such choices, again with probability 
1 _ 

It follows that, with probability 1 — iXj''^^^^ when we choose the Wi and the Ui, the probability 
that for every j and ij+i, ■ ■ ■ ,ik the sequence wi + ai, . . . , ujj-i + Oj-i, '^ij+i , ■ ■ ■ ,i^ik is (rj, j)-good, 
conditional on that choice of the Wi and Ui, is at least 1 — [Xl^^^-*^). Therefore, the sets and measures 
Ui, . . . , Um, wi, . . . , Wfc are sufficiently random with probability 1 — as claimed. □ 

6.2 The proof for basic anti-uniform functions 

Let Ui,. . . , Um and Wi, . . . , W^ be two sequences of sets, with each Ui distributed as Xp and each Wi 
as Xg, where Lq = p> q>po for some large constant L and all choices are independent. It will be by 
choosing this constant L to be large enough that we will make our trick of using normalized restrictions 
work. Let oji, . . . be the associated measures of Wi, . . . , Wk, and suppose that Ui, . . . , Um and 
uji,...,ujk are sufficiently random. That will be our basic set-up for this subsection. We shall also 
assume that the Ui satisfy properties 1 and 2 for a certain sufficiently small value of the parameter 
T], and, by property 0, that they each have size at most 2p|X|. Again, this latter property holds 
with high probability by Chernoff's inequality. The task of showing that properties 1 and 2 also hold 
with high probability, and therefore that this section is not vacuous, will be left to a later section. In 
the statements of our lemmas, we shall take all these assumptions for granted, rather than explicitly 
stating them every single time. 

Before our next lemma we need some definitions. Let the associated measures of Ui, ... , Um be 
Hi, ... , Hm, and let ujh,a denote the associated measure of Wh + a (or, more accurately, the translate 



32 



by a of the associated measure Uh of Wh.) ■ We will also write ^ for the characteristic measures of 
Wh + a. 

Definition 6.10. For any j and any sequence {ij+i, . . . ,ik) of distinct integers between 2 and m, let 
. . . , ife) he the set of all basic anti-uniform functions Oj(g, . . . ,g, fj+i, ■ ■ ■ , fk), where Q < g <1 
and < fh < iJ-if^ for each h between j + 1 and k. Let '^{ij+i, . . . be the set of all functions 
Oj{fi, . . . , fj-i, fj+i, • • • , fk) such that the constituent functions fh have the following properties. If 
1 < < i — 1; then < fh < uj'f^ a for sorn,e a, and ifj + l<h<k then < fh < ^h, where Vh is the 
associated measure of some set Vh G {Uif^)q/p such that \Vh\ < 2q\X\. 

We shall now show that every function in . . . ,ik) can be approximated by a convex com- 

bination of functions in ^{ij-^-i, . . . ,ik)- This will be very useful to us, because ^(fj+i, ... ,i^) is a 
much "smaller" set than . . . However, we need to be rather careful about precisely what 

we mean by "can be approximated by" . 

Lemma 6.11. LetL and a < 1 be positive constants. Then there exists a positive constant rj depending 
on a such that, for \X\ sufficiently large depending on L and a, the following holds. For every positive 
integer j < k and every sequence {ij+i, . . . ,ik) of distinct integers between 2 and m, there is a non- 
negative function H = . . . ,ik) such that \\H\\i < a, ||oo < 2, and, ifoj{g, ■ ■ ■ ,g, fj+i, ■ ■ ■ , fk) 
is any function in . . . , ik), then there exists a function a in the convex hull of ^'(ij+i, . . . ,ik) 
such that 

< Oj{g,.. .,g,fj+i, ...,fk) - a < H. 

Proof. Let us choose a random function ip in ^'(ij+i, . . . , i/;) as follows. We start by choosing a 
random sequence {uj[, . . . , i^j+i, ■ ■ ■ ,i^k)- Here, each is chosen uniformly at random from the 
\X\ measures ^ and each Uh is the associated measure of a set Vh, where the sets Vh are independent 
and distributed as {Uif^)q/p. We then let i/j be the function 

°jisui[,- ■ ■ ,gui'^_^,ifj+i)uj+i,- ■ ■ , ifk)uk) 

if every Vh has size at most 2(/|X|, and the zero function otherwise. Finally, we take a to be the 
expectation of ip, which is certainly a convex combination of functions in ^'(ij+i, . . . , ik). 

Let us begin with the first inequality. Here we shall prove the slightly stronger result that the 
inequality holds even if we take = Oj{g^,^, . . . , g^,^_^, (fj+i)^.^^, . . . , (f^),,^) for all choices of Vh 
(rather than setting it to be zero when one of the Vh is too large). 

Without loss of generality, (ii, . . . , i^) = (1,2,..., k). Let T be the function from M to M defined 
by T{y) = min{y, 2} and let S{y) = y — T{y) = max{y — 2, 0}. Then 

°j{9,---,9,fj+i,---,fk) = T{*j{g,...,gJj+i,...Jk)) 

= T(E{*j{g^,^,. . . , c/^_^, (/j+i)^.^^, . . . , ifk)uk))), 

by Lemmas 6.5 and 6.6 and the fact that g = ^adu'^ (the reason we use characteristic measures here 
is so that this identity works). On the other hand, 

^i°j{9oj[,- ■ ■ ,9u>'._,,{fj+i)uj+i,- ■ ■ , {fk)uu)) = ^T{*j{g^,_^,. . .,g^,,_^,{fj+i)^j^^,. . . , {fk)vk))- 
Since T is a concave function, the result follows from Jensen's inequality. 
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Now let us define H and prove the second inequality. Since capped convolutions are smaller than 
convolutions and, as above, *j{g, . . . , g, fj+i, . . . , fk)) = E{*j{g^>^, . . . , g^>^^^, {fj+i)^,^^^, . . . , {fk)uj), 
the left-hand side of this inequality is at most ]E(*j(5(^^, ... ,5^' ^, {fj+i)iyj_^_j^, • • • , {fk)yk) ~ V')) which 
is Er, where 

r = S{*j{g^:^,. . . ,g^>,__^,{fj+i)^j^^,. . . , {fk)i^J) 

if every Vh has size at most 2q\X\, and *j{goj[,- ■ .,g^',_^,{fj+\)vj+i,. . . , {fk)vk) otherwise. 

If every Vh has size at most 2g|X|, then r < S{*j{uj'i^ . . . ,ujj_-y,Uj^i, . . . , i^k)), since S is an increas- 
ing function. If there is some set Vh which is too large, then we use the bound 

*j{9uj{:- ■ ■ ,9ujr_-,:ifj+i)uj+i,- ■ ■ , Uk)vk) < *j (1, . . . , 1) = l^l*^, 

IJt I 1 
which follows since g^^'^ ^ (^h — Jv] — ^-^^ ifh)i^h ^ ^ {p/Q)l-^ih — 1~ — \-^\ every h. 

Since, by Chernoff's inequality, the probability that some one of the Vh has size larger than 2q\X\ is 
exponentially small in q\X\, the contribution of these bad terms is o(l) everywhere. 
We also have the trivial bound 

°j{9, ■ ■ ■ ,g,fj+l, ■ ■ ■ , fk) - cr < 2. 

Accordingly, if we set H = minjr/ + E,S{*j{uj[, . . . ^j+i^ • • • ) ^k))^ 2}, then, provided \X\ is suffi- 

ciently large, we have a function that satisfies the second inequality and trivially satisfies the inequality 
||-ff||oo < 2. 

It remains to bound ||i?||i. Let 77 = a/4. The sufficient randomness assumption tells us that the 
probability, when we choose our random sequence (wi, . . . , cjj-i, Vj+i, ■ ■ ■ , i^jfc), that it is (??, j)-good is 
1 — o(|X|~'^). Since there are at most k\X\^ ways of choosing j and lui, . . . ,ujj^i, it follows that with 
probability 1 — o(l), every single such choice results in an (77, j)-good sequence. That is, we have the 
inequality 

II *j (i^l,ai, ■ ■ ■ ,i^j-l,aj-i,'^j+l, ■ ■ ■ , Z^fe) - °j{^l,ai,- ■ ■ ,i^j-l,aj-i,'^j+l, • • • ,i^A;)||l < 

The condition that jWij = (1 + o(l))q'|X| implies that, for every 1 < i < k and every a G |X|, 
^x^i,a{x) = {1 + o{l))KxCo'^ g^{x) . Therefore, for \X\ sufficiently large, 

II *j K,ai>--->^-l,a,_i'2^j+l>--->^^fc) -OjK,ai'---'^-l,a,_i'Z^i+l>--->'^fc)lll < 2??, 

for any i^j+i, ■ ■ ■ ,Vk such that every choice of j and uji, . . . , Wj-i yields an (ry, j)-good sequence. 

If Vj+i, ■ ■ ■ ,i^k are such that there exists (ai, . . . , aj-i) for which (wi,ai, • • ■ > ^j-i,aj-i, ^i+i; • • • > ^fc) 
is not good at j, then we use a "trivial" bound instead. For each fixed choice of (T^+i, . . . , V^) we have 

m *j K,ai>--->^-l,a,_i>2^i+l>--->2^fc)l|l = II *j (1, • • • , 1 , Z^j+l , • • • , l^fc) || 1 

< {p/q)''~^\\ *j (l,---,l,W,+i,---,/iiJ||i, 

where the expectation here is taken over all sequences (ai, . . . ,aj-i). The inequality follows from the 
fact that < < {p/q)piif^ for each i. The constant r] was at most 1, so applying property 1 if j = 1, 
we find that 

II *3 (/^j2>--->/^iJI|l < II °j (/^j2>--->/^ifc)l|l + II *j (/^i2>--->/^ifc) -°j(/^i2>--->/^ifc)l|l < < 3. 
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Similarly, applying property 2 if j > 1, we have || *j (1, . . . , 1, , • • • , A'ij.)!!! < 2. In either case, 
E|| *j (a;i,„^, . . . ...,iyk)\\i< Kp/q?'^ < 

As \X\ tends to infinity, the probability that the first bound does not hold for every (ai, . . . , aj_i) 
tends to zero, and the second bound always holds. Therefore, if \X\ is sufficiently large, it follows that 

\\H\\i < 2t] + E\\ *j K,„^,...,^_i,„._^,^'j>i,...,t'fe) -Oj-(cji^„^,...,^_i_„._^,^'j+i,...,zyfc)||i 
< 4?7 = a. 

The result follows. □ 

What we have shown is not just that every element of . . . can be approximated well 

in Li and reasonably well in Loo by a convex combination of elements of '^{ij+i, . . . but rather 
the stronger statement that the difference is bounded above by a fixed bounded function with small 
-Li-norm. This will be important to us later. 

The title of this section was "The set of basic anti-uniform functions has few extreme points." 
That is an oversimplification: the next result is what we actually mean. 

Lemma 6.12. Let < a < 1/2A; and L > 2 be a positive integer withp = Lq. Then, for \X\ sufficiently 
large depending on L and a, the following holds. For every positive integer j < k and every sequence 
{ij+i, . . . ,ik) of distinct integers between 2 and m there is a collection ^' = '^'{ij+i, . . . ,ik) of at 

most ~^(2/a)('=-i)29l^l functions that take values in [0,2], and a non-negative function 

H + H' = H{ij+i,...,ik) + H'{ij+i,...,ik) with \\H\\i < a, \\H\\oo < 2, \\H'\\i < 3a{k - 1) and 
ll-f^'lloo ^ 2, such that for every function (j) = ((/,..., (jr, /j^.) in ^(ij+i, . . . , i^) there is a 
function i/j in the convex hull of with 'ip<(p<ip + H + H'. 

Proof. Choose a positive integer t such that a/2 < t~^ < a. Then there exists a non-negative function 
g' with < g' < 1 such that every value taken by g' is a multiple of t^^, and such that < g' < g < 
g' + a. Also, for every h and every function f^ such that < fh ^ IJ'i^ there exists a function /j^ with 
^ < fh< t^ih taking values that are multiples of t~^^if^ such that < fj^ < fh < f'^ + aiXi^^. 

We would now like to show, for any such choice of g and /j+i, . . . , fk, that the functions (f> = 
Oj{g, ...,g, fj+i, ...,fk) and c/)' = Oj{g', ...,g', /j^^, ■ ■ ■ , fjj are reasonably close. We shah consider 
the two cases j = 1 and j > 1 separately. 



35 



I{ j = 1 then 



k 

^-(t>' = ^{°l{f2^ ■ ■ ■ , f'h-lJh, ■ ■ ■ , fk) - °l{f2, ■ ■ ■ , f'hJh+1, ■ ■ ■ , fk)) 
h=2 

k 

- X](*l(/2) • • • > f'h-1^ fh,---, fk) - *l(/2) ---,f'h, fh+1, fk)) 

h=2 

k 

= X](*l(/25 f'h-1^ fh - fh, fh+1, ■ ■ ■,fk)) 
h=2 
k 

< XI (^»2 fJ'ih-i ' "/^j^ ' l^ih+i ,---,tJ-ik)) 

h=2 

= a(fe- l)(*l(//^2,•••,Wh_l,M^h,W^+l,•••,wJ)• 
Since r/ < 1, property 1 implies that 

II *1 (/^i2! • • • J /^jh-i'^jft'^jft+i' • • • ' A''*fc)lll — 3) 

so we find that oi{f2, ■ ■ ■ , fk) ~ °i(/2' • • • ' /(•) bounded above by a function H'{i2, . . . ,ik) with L\- 
norm at most 3a(A: — 1). It is clearly also bounded above by 2. 

Lemma 6.11 gives us = H{i2, . . . ,ik) with ||-ff||i < a, ||-f^||oo < 2, and also G '^{12, - - - ,ik) 
such that < oi(/2, . . . , — tp < H. Putting these two facts together implies the required bounds 
on H and H' for the case j = 1. 

If J > 1 then a very similar argument shows that 

(f)-(t)' < a{k -l){*j {!,...,!, Hj+i,...,Hk)). 

By property 2, || *j (1, . . . , 1,/Xj_|_i, . . . ,Hk)\\oo ^ 2, so in this case we have a function H'{ij_^i, . . . ,1^) 
with Loo-norm at most 2a{k — 1) < 2 and therefore with Li-norm at most 2a{k — 1). 

All that remains is to count the number of functions in ^ that arc normalized restrictions of 
functions of the form Oj{g' , . . . ,g' , /j+i, • • • , /(.)• 1^ is here that we shall use the assumption mentioned 
at the beginning of the subsection, that the sets C/j each have cardinality at most 2p|X|. There are 
at most l^p"''' choices for the set (ai, . . . ,aj_i), and for each j + I < i < k, because of the upper 
bound on the sizes of the Ui and Vi, there are at most 1-^^1(251x1) choices for the set V^. (Note that 
since p = Lq > 2q, the largest binomial coefficient is indeed this one.) Finally, each valuation of each 
function has at most t < 2/a possible results and each of the k — 1 functions has a domain of size at 
most 2g|X|. Therefore, the number of normalized restrictions is at most 

|X|'=-^g]J[)'''(2/a)('=-WI 
as required. □ 
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6.3 The proof for products of basic anti-uniform functions 

Our next task is to generalize Lemma 6.12 to a result that applies not just to basic anti-uniform 
functions but also to products of at most d such functions. This is a formal consequence of Lemma 
6.12. The exact nature of the bounds we obtain for || J||i and || J||oo is unimportant: what matters is 
that the first can be made arbitrarily small and the second is bounded. We need a definition. 

Definition 6.13. If <p = *j {g,g, . . . , g, fij+i fi^) is a basic anti-uniform function, where < /j^ < 
pLi-, define the profile of (j) to he the ordered set (ij+i, . . . , ife), and if is a product of d basic anti- 
uniform functions (j)^, then define the profile of ^ to be the set of all d profiles of the We will refer 
to d as the size of the profile. 

Corollary 6.14. Let < a < 1/2A; and L > 2 a positive integer with p = Lq. Then for every profile 
A of size d there is a collection A = A{A) of at most |-'^|'^''(2q|x|) functions that take 

values in [0,2'^] and a non-negative function J = J{A) with \\J\\i < dakG'^ and \\J\\oo < d6'^, such that 
for every function ^ that is a product of basic anti-uniform functions with profile A, there is a function 
■0 in the convex hull of A with ip < < ip + J . 

Proof. Every function with profile ^ is a product (pi . . . (pd, where each (pi is a basic anti-uniform 
function with some fixed profile. That is, each (pi belongs to a fixed set of the form . . . 

By Lemma 6.12 we can find ipi such that ipi < (pi < ipi + Ji, where t/'i belongs to the convex hull of a 
set of size at most \X\'"^{fJ^-^^^)'"^ {2/af"^i\^\ , and Jj is a fixed function such that || Ji||oo < 4 and 
||«^i||i < 4aA;. 

It follows that Hi V'j < C < Uiii^i + Ji)- But 

d d d 

i=l i=l h=l if^h 

Since each tpi has Loo-norm at most 2, the latter function has Li-norm at most dak6'^ and Loo-norm 
at most ^6"^, as claimed. □ 

We are now ready for the main result of this section. 

Lemma 6.15. Assumptions 0, 1 and 2 imply assumption 3. 

Proof. Let [/i , . . . , Ura be random subsets of X with elements chosen independently with probability 

p. Assumptions 0, 1 and 2 tell us that we have properties 0, 1 and 2 with probability 1 — 

Next, let A be a profile, and suppose that i is not involved in A. Let T be the set of all ^ with profile 

A, and let A = A(^) be the set that is given to us by Corollary 6.14 (which we have with probability 

1-|X|-^W). 

Note that for every function ^ G F, there exists -0 in A such that 1^ — V'l < -f^) where \\H\\i < dakG"^ 
and ||iL||oo < d^"^- Let a = \/12kd&^. Then Corollary 6.3 implies that, with probability tending to 1, 

max{|(/Xi - l,(p)\ :(peT}< max{|(/ii - 1,V'')I ^^} + \- 

Note that this step depends critically on the fact that /Xj is entirely independent of the set A (A). It 
was for this purpose that we chose m random sets [/i, . . . , C/^ rather than one single random set U. 
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It remains to prove that, with probabiHty 1 — \X\ max{|(;Uj — l,tp')\ : tp' G A} < A/4. 

By Lemma 5.2, since Wtp ||oo < 2'^ for all Tp' G A, the probability that \{fii - > is at 

most exp(-A2p|X|/22'^+^0) for any given tp'. Since p = Lq, we may estimate the number of elements 
in A(^) as follows. 



2kdp\X\/L 



= m 



kd 



^ 72Lfcd6^ 



2kd\ V\X\IL 



If wc choose L sufficiently large (depending on k,d and A), then we can arrange for the sum of the 
probabilities, which is at most exp(-A2p|X|/22'='+i°)|A(A)|, to be \X\-^^^^ . 

We are almost done. We now wish to prove a result about /x = /xi H h \Xrn ■ Applying our result 

so far to all profiles simultaneously, we find that with probability 1 — K/x, — < A/2 for 

every /Xj and such that i is not involved in the profile of Fix a particular ^q- If we choose i at 
random, the probability that it is involved in the profile of is at most (k — l)d/m. Furthermore, for 
any i, we have the trivial bound |(/Xi — l,^o)| < 2^^+^, since ||^o||oo < and, for \X\ sufficiently large, 
WlJ'i — l||i < 3. Therefore, 

K/x - l,eo)| < E,|(/x, - l,^o)| < ^^^2'^+2 + ^ < A, 
provided m > kd2'^~^^/X. The result follows. □ 



6.4 Obtaining assumption 3' as well 

It is possible to add a fixed set of bounded functions to the collection of basic anti-uniform functions, 
provided only that this set has size smaller than 2^1^!/^°, where Lq is again some constant depending 
only on k, A and d, and the above proof continues to work. Indeed, adding such a collection can 
increase the size of the set of products of basic anti-uniform functions by a factor of at most 2''pI^I/-^o. 
Therefore, when we come to the final line of the penultimate paragraph of the proof of the previous 
lemma, provided Lq and L have been chosen small enough, the probability that the random measure 
fx correlates with any given function is still small enough to guarantee that with high probability 
max{|(/Xi — l,ip')\ : tp' E r LI T} < A/4, where F is the set of functions with a given profile that docs 
not involve ytXj. The remainder of the proof is the same, in that we add over all profiles and rule out 
the set of small exceptions where the set Ui is involved in the profile of ^. 

Later, when we come to apply this observation, T will be a collection of characteristic functions. 
For example, to prove a stability version of Turan's theorem, the set T will be the collection of 
characteristic measures of vertex subsets of {1, ... , n}. This has size 2". Therefore, provided^ > Cn~^, 
for C sufficiently large, we will have control over local densities. 



38 



7 Probabilistic estimates I: tail estimates 



In this section we shall focus on proving assumption 2. That is, we shall show that with high probability 

II *j (1, 1, . . . , 1, fJ-ij^i , ■ ■ ■ , f^i^)\\oo < 2 for every j > 2 and every sequence ij+i, . . . , ifc of distinct integers 
between 1 and m. It will be helpful for the next section if we actually prove the following very slightly 
more general statement. For every 1 < j < fe, every collection of measures i^i, . . . , such that at least 
one of the measures other than Uj is the constant measure 1 and the rest are distinct measures of the 
form fii- has the property that || *j (z^i, • • • , i^fc) ||oo < |- 

Up to now, our argument has been general. Unfortunately, we must now be more specific about 
the kind of sets that we are dealing with. We shall split into two cases. First, we shall look at systems 
S with the following property. 

Definition 7.1. A system S of ordered sequences of length k in a set X has two degrees of freedom 
if, whenever s and t are two elements of S and there exist i ^ j such that Si = ti and sj = tj , we have 
s = t. 

This includes the case when S is the set of arithmetic progressions in Z^j, and higher-dimensional 
generalizations concerning homothetic copies of a single set. 

After that, we will look at graphs and hypergraphs. In this case, the required estimates are much 
more difficult. Thankfully, most of the hard work has already been done for us by Janson, Rucihski 
and, in one paper, Oleszkiewicz [26, 27, 28, 29, 30] (see also the paper of Vu, [63]). We shall return to 
these estimates later. 

7.1 The proof for systems with two degrees of freedom 

Let Ui,..., Ura be independent random sets chosen binomially and let their associated measures be 
/ii, . . . ,/irn- We are interested in quantities of the form . . . ^i'k){x), where each i/j (with i ^ j) 

is equal to either the constant function 1 or to one of the measures fir- We also insist that no two of 
the Ui are equal to the same and that at least one of the Ui is the constant function. 

Suppose that the set of i such that vi is one of the is {ai, . . . ,ai} and that = /x^^ for 
/i = 1, 2, . . . , Z. Then we can interpret *j(i^i, . . . , i^k){x) as follows. Recall that Sj{x) is the set of all 
s = (si, . . . , Sjfc) G 5 such that Sj = x. And *j{ui, ■ ■ ■ , i^k){x) is equal to times the proportion of 
s G Sj{x) such that G [7^^ for every h = 1, . . . , Z. This is because Vaui^ah) — if ^a/j ^ 
otherwise. 

Now let us regard sequences s € S as fixed and Ui, . . . ,Uk ss random variables. For each s let E{s) 
be the event that Sa^ G for every i = 1, ... ,1 {so E{s) is an event that depends on Ui, ... , Um)- 
We claim that if s and t are distinct sequences in Sj{x), then E{s) and E{t) are independent. The 

reason for this is that we know that sj = tj, and our assumption that S has two degrees of freedom 
therefore implies that there is no other i such that Si = ti. It follows that the events Sa^ G Ub^ and 
taf^ G are independent (since the sets Ui are chosen binomially) and hence that E{s) and E{t) are 
independent (since the sets J/^^, . . . ,Ui,i are independent). 

Lemma 7.2. Let X be a finite set, let S be a homogeneous collection of ordered subsets of X, each 
of size k, and suppose that S has two degrees of freedom. Let U\, . . . ,Uk be random subsets of X with 
associated measures /xi, . . . each chosen binomially with probability p. Let 1 < j < k and let L 
be a subset of {1, 2, . . . , A;} \ {j} of cardinality I < k — 1. For each i < k, let Vi = fj,i if i G L and 
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1 otherwise. Let x G Then the probability that . . . , i/j+i, . . . , < ^ is at least 

1 - 2exp(-j3'|5'j(x)|/16). 

Proof. Let Xi be the characteristic function of Ui. Suppose that L = {ai, . . . , a/}. Then 

Now HieL Xii^i) is the characteristic function of the event E{s) mentioned just before the statement of 
this lemma, in the case when b^ = for every h. As we have discussed, these events are independent. 
Moreover, they each have probabihty pK Therefore, ^seS {x)TlieLXi{si) is an average of [^^(a:;)! 
independent Bernouhi random variables of probability pK 

By Chernoff's inequality, Lemma 5.3, J2seSj{x)TlieLXi{si) < ^p''\Sj{x)\ with probability at least 
1 — 2exp(— p'|S'j(x)|/16). Therefore, 'Kg(zSj(x)TlieLXi{si) ^ |p' with the same probability. This proves 
the result. □ 

It is perhaps not immediately obvious how the bound for the probability in the last lemma relates 
to sharp values for p in applications. To get a feel for this, consider the case when S is the set of 
arithmetic progressions of length k in Z„. Then |5j(x)| = n for every x and j and \X\ = n. We 
want to be able to take p to be around n~^^^''~^\ With this value, exp{—p''\Sj{x)\/16), takes the form 
exp(— cn^~'/(*^~-^)). In the worst case, when I = k — 2, this works out to be exp(— cra^/^*^"^^), which 
drops off faster than any power of n. If we took I = k — 1 then we would no longer have an interesting 
statement: that is why convolutions where every is equal to some Hj must be treated in a different 
way. 

7.2 The proof for strictly balanced graphs and hypergraphs 

We now turn to the more difficult case of finding copies of a fixed balanced graph or hypergraph. 
Again, we are trying to show that . • • , i^k)ix) is reasonably close to 1 with very high probability, 

but now this quantity is a normalized count of certain graphs or hypergraphs. Normally when one has 
a large deviation inequality, one expects the probability of large deviations to be exponentially small 
in the expectation. In the graph case a theorem of roughly this variety may be proved for the lower 
tail by using Janson's inequality [26], but the behaviour of the upper tail is much more complex. The 
best that can be achieved is a fixed power of the expectation. The result that we shall use in this case 
is due to Janson and Rucihski [29]. Before we state it, we need some preliminary discussion. 

To begin with, let us be precise about what we are taking as X and what we are taking as S. We 
are counting copies of a fixed labelled /c-uniform hypergraph H. Let H have vertex set V of size m 
and (labelled) edge set (ei, . . . , e^). (That is, each is a subset of V of size k and we choose some 
arbitrary ordering.) Let be a set of size n (which we think of as large) and let X = W^''\ the set 
of all subsets of W of size equal to k. 

Given any injection (j) : V ^ W we can form a sequence (si, . . . ,Sr) of subsets of W by setting 
Si = (j){ei). We let S be the set of all sequences that arise in this way. The elements of S are copies of 
H with correspondingly labelled edges. 

If we fix an edge e E X and an index j, then Sj{e) is the set of all sequences (si, . . . , Sr) in S such 
that Sj = e. To obtain such a sequence, one must take a bijection from ej (which is a subset of V of 
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size k) to e (which is a subset of W of size k) and extend it to an injection (j) from V to VF. One then 
sets Si = (p{ei) for each i. 

Now let C/i, . . . , Um be independent random subsets of X, chosen binomially with probabihty p and 
let their associated measures be /Lti, . . . , Hm- Suppose once again that i/i, . . . , i/^ are measures, some of 

which arc constant and some of which are equal to distinct fii. Suppose that the non-trivial measures, 
not including if it is non-trivial, arc , • • • , '-^m , and suppose that = Hb- for i = 1,2, ... ,1. Then 
the value • • • , t'r)(e) of the j'th convolution at e is equal to 



l<i<l 

This is times the number of sequences (si, . . . , Sr) G S such that Sj = e and G C^. for every 
1 < i < I. If wc define H' to be the subhypergraph of H that consists of the edges e^^, . . . , e^j , then 
each such sequence is a so-called Sj-rooted copy of -ff' in {e,X). That is, it is a copy of -ff' where we 
insist that the vertices in Sj map bijectively to the vertices in e. We are interested in the number of 
rooted copies such that the edges fall into certain sparse random sets. This is not an easy calculation, 
but it has been done for us by Janson and Rucihski. In order to state the result we shall need, let us 
define formally the random variable that we wish not to deviate much from its mean. 

Notation. Let K he a k-uniform hypergraph and e an edge in K. Let I he the numher of edges in 
K\{e} and let Ui,...,Ui be random binomial subhypergraphs of the complete k-uniform, hypergraph 

(k) 

Kn on n vertices, each edge being chosen with probability p, with characteristic functions xi, ■ ■ ■ ,Xl- 
Let Se be the set consisting of all labelled ordered copies of K\{e} in Kn"^ that are e-rooted at a given 
edge e'. Then the random variable is given by 

E n X^{S^). 

s£Se l<i<l 

Strictly speaking depends on e' as well, but we omit this from the notation because it makes 
no difference to the probabilities which edge e' we choose. (So we could, for example, state the result 
for e' = {1, 2, . . . , A:} and deduce it for all other e'.) 

The number of injections (f) that extend a bijection from eto e' is k\{n — k){n — k — l) . . .{n — VK+^), 
and for each one the probability that Sj G Ui for every i is p' = p^'^"^, so the expectation EY^ is 



P 



•^'^-'^k\{n - k){n - k - 1) . . . {n - vk + 1)- 



The precise details will not matter to us much, but note that the order of magnitude is p^^~^n'"'^~^ . 

We are now ready to state the result of Janson and Rucihski. It is actually a very special case of 
a much more general result (Corollary 4.3 from [29]). To explain the general statement would lead us 
too far astray so we restrict ourselves to stating the required corollary. 

Lemma 7.3. Let K he a k-uniform hypergraph and e a fixed edge. Then there exists a constant c 
such that the random variable Y^ satisfies 

P (yr > < 2n''^ exp ^-c min(El2 )^/^^^ . 
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A better, indeed almost sharp, result has recently been proved by Janson and Rucinski [30]. 
Unfortunately, though the result almost certainly extends to hypergraphs, it is stated by these authors 
only for graphs. However, the previous result is more than sufficient for our current purposes. 

We are now ready to prove assumption 2 in the case of strictly balanced graphs and hypergraphs. 

The proof is essentially the same as it was for systems with two degrees of freedom, except that we 
have to use the results of Janson and Rucinski instead of using Chcrnoff's inequality. Recall that a 
/c-uniform hypergraph H is strictly /c-balanced if mk{H) > mk{K) for every proper subhypergraph K 
of H, where mk{H) is defined to be {en — ^)/{vh — k). 

(k) 

Lemma 7.4. Let H be a strictly k-balanced k-uniform hypergraph. Let X = Kn and let S he the 
collection of labelled ordered copies of H in X. Let Ui, . . . ,Uk be random subsets of X, each chosen 
binomially with probability p, and let their characteristic measures be /xi, . . . Let 1 < j < k and 
let L be a subset of {1,2, ... ,k} \ {j} of cardinality I < k — 1. For each i < k, let = fii if i ^ L and 
1 otherwise. Let e d X. Then for p > n^^/^'^kiH) iJi^^q exist positive constants a and A such that the 
probability that • • • , ^j-i-, ^j+i-, ■ ■ ■ , i^fc)(e) < ^ is at least 1 — 2n'"" e~^'^°' . 

Proof. Let Xi be the characteristic function of Ui. Suppose that L = {ai, . . . , a;}. Then 

ieL 

The sum YlseSj{e) IlieLXii^i) counts the number of rooted copies of some proper subhypergraph K 
of H. By Lemma 7.3, the probability that ^g^Sj{e) TlieLXii^i) ^ is at most 

vj-k ej-l\l/vj 



2n"^ exp (^-cmin(El7)^/^^^ = 2n''^ exp (^-c min(n"^-V 



Since H is strictly /^-balanced, we know that mk{J) < mk{H) for every J C. K. Therefore, there is a 
positive constant a' such that if p > n~^/'^^^^\ then for each J C K we have the inequality 

Therefore, 

min(n''^-V''-^)^/^'' > n", 

JCK 



for some a, and hence the probability that ^^^^^^(e) YlieLXii^i) ^ is at most 2n"(^)e 

for some positive constants A and a. The lemma follows. □ 



8 Probabilistic estimates II: bounding Li-differences 

Our one remaining task is to show that assumption 1 holds. Recall that this is the statement that 
if Ui,... , Um are subsets of X chosen binomially with suitable probability p, and if their associated 
measures are /xi, . . . , jij^, then with high probability 

II *i (/^ii ) • • • ) l^ij-i 1 ) • • • ) /^jfc) ~ °j (Wi ! ■ • ■ ) Mij-i ) l^ij+i- 1 ■ ■ ■ 1 /^*fc) 111 — 
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whenever j is an integer between 1 and k and h, ■ ■ ■ ■ ■ ■ ,ik are distinct integers between 1 

and m. Of course, if we can prove this for one choice of j and ii, . . . , ij+i, . . . ,ik then we have 
proved it for all, since m and k are bounded. So without loss of generality let us prove it for j = 1 
and for the sequence {2,...,k). That is, we shall prove that with high probability 

II *1 (.1^2, ■ ■ ■ , fik) -Ol(//2,---,Mfc)||l <V- 

The basic approach is to show that with high probability the sets U2, ■ ■ ■ , Uk-i have certain prop- 
erties that we can exploit, and that if they have those properties then the conditional probability 
that|| *i (/Li2, . . . ,/Ltfc) — oi(//2) • • • ) A*fc)||i < 1] is also high. This strategy is almost forced on us: there 
are some choices of U2, ■■■ , Uk-i that would be disastrous, and although they are rare we have to take 
account of their existence. 

To get some idea of what the useful properties are, let us suppose that we have chosen U2, ■ ■ ■ , Uk-i, 
let us fix X G X, and let us think about the random variable *i(//2, • • • , l^-kjix) (which, given our choices, 
depends just on the random set Uk)- This is, by definition, 

lEseSi(x)M2(s2) • • • Hk-i{sk-i)fJ-k{sk)- 

At this point we need an extra homogeneity assumption. We would like to split up the above expec- 
tation according to the value of Sk, but that will lead to problems if different values of Sk are taken 
different numbers of times. Let us suppose that for each y the number of s € Si{x) such that Sk = y, 
which is just the cardinality of the set Si{x) fl Sk{y), only ever takes one of two values, one of which 
is 0. 

In the case of arithmetic progressions of length k in Zp, with p prime, Si{x) fl Sk{y) consists 
of a unique arithmetic progression (degenerate if x = y), the progression with common difference 
(k — l)^^{y — x) that starts at x. In the case of, say, K^s in a complete graph, where si and siq 
represent disjoint edges of K^, Si{e) fl 510(6') will be empty if e and e' are edges of Kn that share 
a vertex, and will have cardinality n — 4 if they are disjoint. In general, in all natural examples this 
homogeneity assumption is satisfied. Moreover, the proportion of y for which Si (x) fl Sk (y) = tends 
to be 0(l/n) and tends to correspond to degenerate cases (when those arc not allowed). 

With the help of this assumption, we can rewrite the previous expression as follows. Let us write 
K{x) for the set of y such that Si{x) fl Sk{y) / 0. Then 

*l{l^2, ■ ■ ■ , l^k){x) = ^seSi{x)Ms2) ■■■l^k-l{Sk-l)l^k{Sk) 

= ^yeK{x)fJ'k{y)^seSi{x)nSk(y)f^2is2) ■ ■ ■ fJ-k-l{Sk-l) 

Writing W{x, y) for ^s&Sl{x)nSk{y)^^2{s2) ■ ■ ■ /Xjk_i(sfc_i), we can condense this to ^y^K{x)l^k{y)W{x, y). 

Now we are thinking of fi2, ■ ■ ■ , IJ-k-i ^ fixed, and of the expressions we write as random vari- 
ables that depend on the random measure /i^. Note that the expectation of *i(iJ,2, ■ ■ ■ , ^k){x) is 
*i(M2! • • • ) Mfc-i) 1)(^)- By the results of the previous section, we are free to assume that this is at 
most 3/2 for every x. 

Our plan is to prove that the expectation of *i(/X2, . . . , lJ'k){x) — °i{fJ'2, ■ ■ ■ , A*it)(a^) is small for each 
X, which will show that the expectation of || *i (/i2, . . . , Hk) — °iil^2, ■ ■ ■ )A*jk)||i is small. Having done 
that, we shall argue that it is highly concentrated about its expectation. 
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Now, as we have seen, the random variable *i{fj,2, ■ ■ ■ , Hk){x) is equal to Eyg^(-^)^fc(y)H^(a;, y), 
which is a sum of independent random variables Vy, where Vy = {p\K{x)\)~^W{x,y) with prob- 
ability p and otherwise. The expectation of the average ^y^K{x)^{x:y) is the expectation of 
*i(a*2, • • • , A*fe-i, which we are assuming to be at most 3/2. If we also know that each W{x,y) 

is small, then the chances that this sum is bigger than 2 are very small. From this it is possible to 
deduce that the expectation of *i(/i2, • • • , IJ'k){x) — oi(M2) ■ ■ ■ , IJ'k){x) is small. The following technical 
lemma makes these arguments precise. 

In the statement of the next lemma, we write ^y^K for the average over K, and E for the proba- 
bilistic expectation (over all possible choices of /Xjt with their appropriate probabilities). 

Lemma 8.1. Let < p < 1 and let < a < 1. Let K be a set and for each yEKletVybea random 
variable that takes the value Cy with probability p and otherwise. Suppose that the Vy are independent 
and that each Cy is at most a. Let S = ^y^x ^'^^ suppose that ¥,S < 3/2. Let T = max{S' — 2, 0}. 
Then ET < 7ae-Vi4a. 

Proof. If we increase the number of random variables or any of the values Cy, then the expectation of 
T increases. Therefore, we are done if we can prove the result in the case where ES* = 3/2. 
We shall use the elementary identity 



ET 



/•oo roo 

= / W>[T>t]dt= / ¥[S>2 + t]dt. 
Jo Jo 



Since ES = 3/2, if 5 > 2 + t it follows that S -ES >t + 1/2. Let us bound the probability of this 
event using Bernstein's inequality (Lemma 5.1). 

For this we need to bound J^y'^i^y)^ which is at most J^y^i^y)^ which is at most C(J2y^i^v)^ 
by our assumption about the upper bound for each Cy. But this is oES = 3a/2. Therefore, 

Writing s = t + 1/2, this gives us exp(— s^/(3q + 2as/3)). When s > 1/2 (as it is everywhere in the 
integral we are trying to bound), this is at most exp(— s^/(6a;s + 2as/3)) < exp(— s/7a), so we have 
an upper bound of 

roo 

/ cxp(-s/7a)ds = 7ae-^/^^", 

Jl/2 

which proves the lemma. □ 

Corollary 8.2. Let the notation be as above, and suppose that W(x,y) < ap\K{x)\ for every x and 
y and *i(//2, • • • < 3/2 for every x. Then 



E(*i(/X2,...,/Xfe)(x) -oi(/x2,...,/xjfc)(ar)) < 7ae 



for every x. 



Proof. As noted above, *i{fi2, ■ ■ ■ ,IJ'k){x) is a sum of independent random variables Vy that take the 
value {p\K{x)\)~^W{x,y) with probability p and otherwise. By our hypothesis about W{x,y), 
we can take Cy = a for each y and apply the previous lemma. Then S = *i(/X2, . . . , /Xjt)(a;) and 
T = *i(/X2, . . . ,iJ,k){x) — oi(/X2, . . . ,Hk){x), so the result follows. □ 
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The next result but one is our main general lemma, after which we shall have to argue separately 
for different kinds of system. We shall use the following concentration of measure result, which is an 
easy and standard consequence of Azuma's inequality. 

Lemma 8.3. Let &e the collection of all subsets of size t of a finite set X. Let c, A > and let 
F he a function defined on X^*) such that \F{U) — F{V)\ < c whenever \U r\V\ = t — 1. Then if a 
random set U G X^^^ is chosen, the probability that \F(U) — EF| > X is at most exp(— A^/2c^t). 

Most of the conditions of the next lemma have been mentioned in the discussion above, but we 
repeat them for convenience (even though the resulting statement becomes rather long). 

Lemma 8.4. Let X he a finite set and let S be a homogeneous collection of ordered subsets of X , each 
of size k. Let a be a positive integer and suppose that, for all x,y G X , \Si{x) fl Sk{y)\ G {0, a}. For 
each X, let K(x) be the set of y such that Si{x) fl Sk{y) 7^ 0, and suppose that all the sets K{x) have 
the same size. 

Let fi2, ■ ■ ■ , fJ-k-i be measures such that *i(/X2, • • • , o.f-d /i2, • • • , Mfe-i)(^) most 

3/2 for every x & X. For each x,y E X, let 

W{x,y) = ^seSi{x)nSk{y)l^2{s2) ■ . . fJ-k-iisk-i) 

and suppose that W{x,y) < ap\K{x)\ for every x and y. 

Let Uk be a random set chosen binomially with probability p, let /ife be its associated measure, and 
let T] = 14ae-i/i'^". Then 

P[|| *i (/X2, . . . , Mfe) - oi(M2, . . . , > r?] < 2|X|e-'''^'l^l/^^4 + 2e-^'l^l/^ 

Proof. Lemma 8.2 and linearity of expectation imply that 

E|| *i {ii2, ...,f^k)- oi(/X2, . . . , /Xfe)||i < 7ae-Vi4a = ^/2. 

Let us write Z for the random variable || *i (//2, . . . , Mfc) ~ °i(M2) • • • ) Mfe)l|i- To complete the proof, we 
shall show that Z is highly concentrated about its mean. 

To do this, we condition on the size of the set Uk and apply Lemma 8.3. Suppose, then, that 
\Uk\ = t. We must work out by how much we can change Z if we remove an element oi Uk and add 
another. 

Since the function x t-^ maxja; — 2, 0} is 1-Lipschitz, the amount by which we can change Z is at 
most the amount by which we can change Y = || *i (/X2, . . . , But 

II *1 (/X2,.-.,/^fe)||l = EjMg^Siix) 1^-2(82) ■■■ l^k-l{Sk-l)lik{Sk) 

= EyHk{y)Es^Sk{y)l^2is2) ■ ■ ■ Hk-i{sk-i) 
= Ej//Xjfc(y) (1,/X2,...,/Xfe_i)(y). 

We are assuming that IJ.2, ■ ■ ■ , A*fe-i)(y) is never more than 3/2, and Hk{y) is always either p"^ or 
0, so changing one element of Uk cannot change Y by more than 3(p|X|)~^. (The division by |X| is 
because we are taking an average over y rather than a sum over y.) 

Lemma 8.3 now tells us that the probability that Z — EZ > 77/2 given that \fJk\ = t is at most 
exp(— 77^p^ |X|V72t). It follows that if t < 1p\X\ then the probability is at most exp(-772p|x|/144). 
By Chernoff's inequality, the probability that t > 2p\X\ is at most 2exp(— p|X|/4). Putting these two 
facts together and adding over all possible values of t, we obtain the result stated. □ 
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Our aim is to prove assumption 1 for a sufficiently small constant r/ > 0. Therefore, it remains to 
prove that, under suitable conditions on p, we have the bound W(x, y) < ap\K(x)\ for every x,y G X 
such that Si{x)riSk{y) is non-empty, where a is also a sufficiently small constant. Here, the argument 
once again depends on the particular form of the set of sequences S. 

In the case of sets with two degrees of freedom, this is trivial. Let us suppose that = t for 

every x € X. By definition, Si{x) D Sj{y) is either empty or a singleton for every 1 < i < j < k and 
every x,y & X. It follows, when Si{x) n Sk{y) is non-empty, that 

= Ii2{r2) ■ ■ ■ IJ-k-iirk-i) 



where r = (x, r2, • • • , rk-i,y) is the unique element of S that belongs to Si{x) f] S'fc(y). This is smaller 
than apt as long as p > (Q;t)-i/(^-i). Recall that in a typical instance, such as when S is the set of 
arithmetic progressions of length k in \X\ = n for some prime n, t will be very close to n (or in that 
case actually equal to n), and we do indeed obtain a bound of the form Cn~^/^'^~^'> that is within a 
constant of best possible. 

Thus, we have essentially already finished the proof of a sparse random version of Szemeredi's 
theorem, and of several other similar theorems. We will spell out the details of these applications later 
in the paper. Now, however, let us turn to the more difficult task of verifying the hypothesis about 
W in the case of graphs and hypergraphs. 

Let if be a A;-uniform hypergraph. Recall that we define vk to be the number of vertices of K, 
ck to be the number of edges of K, and mk{K) to be the ratio {ex — ^)/{vk — k). The significance 
of m.k(K) is that if is a random fc-uniform hypergraph on n vertices, with each edge chosen with 
probability p, then the expected number of labelled copies of K containing any given edge of H is 
approximately p^K-'^^fK-k ^^j^g "approximately" being the result of a few degenerate cases), so we 
need p > n~^/'^^^^^ for this expected number to be at least 1, which, at least in the density case, is 
a trivial necessary condition for our theorems to hold. Our main aim is to prove that they hold when 
p > Cn~^^^^^^\ where C is a constant that depends on the hypergraph K only. 

In the next result, we shall take p to equal Cra"^/"**^^^) and prove that the conclusion holds provided 
C is sufficiently large. However, it turns out that we have to split the result into two cases. In the 
first case, we also need to assume that C is smaller than for some small positive constant c, or else 
the argument breaks down. However, when C is larger than this (so not actually a constant) we can 
quote results of Janson and Rucihski to finish off the argument. (Some of our results, in particular 
density and colouring theorems, are monotone, in the sense that the result for p implies the result for 
all q>p. In such cases we do not need to worry about large p.) 

Lemma 8.5. Let K he a strictly k-balanced k-uniform hypergraph and let S be the set of embeddings of 

(k) 

K into the complete k-uniform hypergraph Kn ■ Then, for any positive constants a and A, there exist 
constants c > and Cq such that, if n is sufficiently large, Cq < C < n*^, and p = Cn~^/™*=(^), then, 
with probability at least 1 — n~^, ifU2, ■ ■ ■ , Ue~i are random subgraphs of Kn^ chosen with probability 
p with associated measures H2, • • • , Me-i; 

W{x, y) = ^seSi{x)nSe{,y)l^2{s2) ■ ■ ■ He-i{se-i) < apn^, 
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for all x,y £ X, where we have written e for ex- 

Proof. Let Xi be the characteristic function of Ui for each i < ex- Let a be the size of each non-empty 
set Si{x) n Se{y) and suppose = t for each x. Then 

Wix, y) = a-'p-^'^-^^ ^2(^2) . . . Xe-l{Se-l). 

seSi ix)nSeiy) 

But J2seSi{x)nSe{y) ^2(52) • • • Xe-i('Se-i) is the number of sequences {si, . . . , Se) & S such that si = x, 
Sg = y and Sj € C/j for i = 2, 3, . . . , e — 1. Therefore, our aim is to prove that with high probabihty this 
number is at most aptap^'^~'^ = ap^'^~^at. Let h be the number of vertices in the union of the first 
and eth edges. Then a is almost exactly n^K-h ^ jg almost exactly n^~^, so it is enough to prove 
that with high probability the number of such sequences is at most (a/2)p'^^~^n^^^'^ = (a/2)C^^~"^. 
To do this, let us estimate from above the probability that there are at least {vkt)"'^ such sequences. 

It will be convenient to think of each sequence in Si{x) n S'e(y) as an embedding (f) from K to Kn^ 
such that, writing /i, . . . , /g for the edges of K, we have (f){fi) = x and ^(/e) = y- Let us call (f) good 
if in addition <^(ej) G Ui for i = 2, 3, . . . , e — 1. Now if there are {vKry^ good embeddings, then there 
must be a sequence (pi, . . . ,(pr of good embeddings such that each (piiK) contains at least one vertex 
that is not contained in any of 4>i{K), . . . ,(pi-i{K). That is because the number of vertices in the 
union of the images of the embeddings has to be at least vkt, since the number of embeddings into a 
set of size u is certainly no more than u"^, and because each embedding has vk vertices. 

It follows that there is a sequence vi, . . . ,Vm oi vertices that can be divided up into chunks Vi, . . . ,Vr 
of the form Vi = {va^, Va^+i, ■ ■ ■ , t^ai+i-i} in such a way that for each i one of the embeddings includes 
all the vertices in Vi and no later vertices. Let us call that embedding (pi. 

Let us fix a sequence of embeddings <pi,... ,(pr such that each one has a vertex in its image that 
is not in the image of any previous one. Let vi,V2, ■ ■ ■ ,Vm be the sequence of vertices obtained by 
listing all the vertices of (pi{K) in order (taken from an initial fixed order of the vertices of K), then 
all the vertices of (p2iK) that have not yet been listed, again in order, and so on. For each i < r, let 
Vi be the set of vertices in (pi{K) but no earlier (pj{K). We shall now estimate the probability that 
every 0i is good. If we already know that (pi,. . . are all good, then what we need to know is 

how many edges belong to (pi{K) that do not belong to (pj{K) for any j < i. Let Wi = \Vi\ be the 
number of vertices that belong to (pi{K) and to no earlier (pj{K), and let di be the number of edges. 
Then the conditional probability that (pi is good is p"^* . It follows that the probability that (p\, . . . ,4>r 
are all good is p'^'^^ The number of possible sequences of embeddings of this type is at most 

m^^^TT,"*, since there are at most sequences vi, . . . , Vm, and once we have chosen (pi,. . . , cpi^i there 
are certainly no more than ways of choosing the embedding (pi (assuming that its image lies in 
the set {vi, . . . , Vm\)- Therefore, the probability that there exists a good sequence of r embeddings of 
this type is at most m''^ V"^"'"^"'''^"''''""""''""'''- 

At this point, we use the hypothesis that K is strictly balanced. It implies that 

ex - l-dj ^ Cj^ - 1 
Vk — k — Wi Vk — k' 

which implies that di/wi > mk{K). In fact, since there are only finitely many possibilities for Wi and 

di, it tells us that there is a constant c' > depending on K only such that dj > mk(K)(wi + c/). 
Since p = Cn"^/'"*=('^), this tells us that p"^' < C"=''n"("'»+'''\ and hence that 

Ul'"Kr di-\ \-dr^wi-\ \-Wr ^ rri"K:'^C'^^~^ \-dr ^-rc' 
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To complete the proof, let us show how to choose C, just to be sure that the dependences are 
correct. We start by choosing r such that rc' > 2A. Bearing in mind that m < vkt and that 
di + ■ ■ ■ + dr < exr, we place on C the upper bound C < {vKr)~'"^^n^^^'^'^ , which ensures that 
mVKr(jdi-\ hdr-^-rc £ n~^. Finally, we need C to be large enough for {a/2)C^'^~^ to be greater 
than {vxry^ , since then the probability that there are at least {a/2)C'^^~^ sequences is at most n~^, 
which is what we were trying to prove. Thus, we need C to be at least {2{vKr)'"'^ /a)^/^^^~^\ □ 

To handle the case where C > n'^, we shall again need to appeal to the work of Janson and 
Rucihski on upper tail estimates. The particular random variable we will be interested in, which 
concerns hypergraphs which are rooted on two edges, is defined as follows. 

Notation. Let K be a k-uniform hypergraph and ei,e2 edges in K. Let I he the number of edges in 
i^\{ei, 62} and let Ui, ■ ■ ■ ,Ui be random binomial subhypergraphs of the complete k-uniform hypergraph 

(k) 

Kn on n vertices, each edge being chosen with probability p, with characteristic functions XIt ■ ■ iXl- 
Let 

^ei,e2 ^6 set consisting of all labelled ordered copies of K\{ei, 62} in K^^ that are rooted at 
given edges e'^ and Then the random, variable Y^^'^^ is given by 

E n 

SG<S'ei,e2 

The necessary tail estimate (which is another particular case of Corollary 4.3 in [29]) is now the 
following. Note that EY^^'^^ is essentially p'^K-^^VK-h^ where h is the size of ei U 62. 

Lemma 8.6. Let K be a k-uniform hypergraph and 61,62 fixed edges. Then there exists a constant c 
such that the random variable Y^''^'^ satisfies, for 7 > 2, 

P (Y^i'"2 > 7Ey^i'"^) < 2n^^ exp (^-c min (7Ey^"^'^^)^/''^ 

The required estimate for p > 77,-i/"*a:(^)+c jg g^gy consequence of this lemma. 

Lemma 8.7. Let K he a strictly k-balanced k-uniform hypergraph and let S he the set of embeddings 

(k) 

of K into the complete k-uniform hypergraph Kn ■ Then, for any positive constants a and c, there 

exist constants b and B such that, if n is sufficiently large, C > n'^, and p = Cn~^/'^''^^\ then, with 
probability at least 1 — 2n^^6~ , ifU2,---, Ue-i are random subgraphs of Kn chosen with probability 
p with associated measures fi2, ■ ■ ■ , fJ^e-i, 

W{x,y) = ^seSi{x)nSe{y) 1^2(82) ■ ■ ■ Me-i(se-i) < apn'', 
for all x,y G X, where we have written e for ek- 

Proof. Let Xi be the characteristic function of Ui for each i < ex- Let a be the size of each non- 
empty set Si{x) n Se(y) and suppose = t for each x. Note that EY^^'^-^ = p^'^~'^a. We may 
apply Lemma 8.6 with 7 = apt to tell us that Y^s&Si{x)nS^[y) X2{s2) ■ ■ ■ Xe-i(se-i) > ^p^^'''^(y with 
probability at most 

2n!"^ exp (^-cmin {^iWYl^'^'^ f'''^^ = 2n'"' exp (^-cmin (^771''^- V^"^) ^^"^^ , 
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where h is the size of eiUex- Note that, as t is almost exactly n'' ^, jn^^ ^p^^ ^ > (a/2)ra^^ 
Since K is strictly fc-balanced, for any proper subgraph L of K, 




Since also rf^ ^-p^K ^ > 77,^, the required bound holds with probability at least 1 — 2n^^e for 
some constants B and h. Since, 

W(x, y) = a- ^2(S2) . . . Xe-l{Se-l). 

seSi{x)nSe{y) 

the result now follows for n sufficiently large. □ 

9 Summary of our results so far 

We are about to discuss several applications of our main results. In this brief section, we prepare for 
these applications by stating the abstract results that follow from the work we have done so far. Since 
not every problem one might wish to solve will give rise to a system of sequences S that either has 
two degrees of freedom or concerns copies of a strictly balanced graph or hypergraph, we begin by 
stating sufficient conditions on S for theorems of the kind we are interested in to hold. We have of 
course already done this, but since some of our earlier conditions implied other ones, there is scope 
for stating the abstract results more concisely. That way, any further applications of our methods will 
be reduced to establishing two easily stated probabilistic estimates, and showing that suitable robust 
versions of the desired results hold in the dense case. 

Having done that, we remark that we have proved that the estimates hold when S has two degrees 
of freedom or results from copies of a strictly balanced graph or hypergraph. So in these two cases, if 
the robust results hold in the dense case, then we can carry them over unconditionally to the sparse 
random case. 

The proofs in this section require little more than the putting together of results from earlier in 
the paper. 

9.1 Conditional results 

Recall that a system S of sequences s = (si, . . . , s^) with values in a finite set X is homogeneous if 
for every j < k and every x e X the set Sj{x) = {s e S : sj = x} has the same size. Let be a 
homogeneous system of sequences with elements in a finite set X, and let us assume that no sequence 
in S has repeated elements. We shall also assume that all non-empty sets of the form Si{x) PI Sk{y) 
have the same size. Coupled with our first homogeneity assumption, this implies that for each x the 
number of y such that Si{x) n Sk{y) is non-empty is the same. 

We are about to state and prove a theorem that is similar to Theorem 4.5, but with conditions 
that are easier to check and a conclusion that is more directly what we want to prove. The first 
condition is what we proved in Lemmas 7.2 and 7.4. We suppose that X is a given finite set, 5 is a 
given homogeneous system of sequences with terms in X, and p is a given probability. Recall that we 
are writing ^{1) for a sufficiently large constant. 
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Condition 1. Let Ui, . . . ,Uk be independent random subsets of X, each chosen binomially with prob- 
ability p, and let 111, . . . , Hk be their associated measures. Let 1 < j < k and for each i ^ j let Ui equal 
either fj,i or the constant measure 1 on X, with at least one vi equal to the constant measure. Then 
with probability at least 1 — IXI^^^-*^), 

Uj-i, Vj+i, Uk){x) < 3/2 

for every x E X. 

Recall that if L is the set of i such that Vi = ^i, then . . . , . . . , is p~^^^ 

times the number of s G Sj{x) such that Sj G Ui for every i e L. Since the expected number of such 
sequences is p'^' |S'j(x)|, Condition 1 is saying that their number is not too much larger than its mean. 
(One would usually expect a concentration result that said that their number is, with high probability, 
close to its mean.) 

The second condition tells us that the hypotheses of Lemma 8.4 hold. Again we shall take X, S 
and p as given. 

Condition 2. Let U2, . . . ,Uk be independent random subsets of X, each chosen binomially with proba- 
bility p, and let iJ,2, . . . , Mfc-i be their associated measures. Let a > be an arbitrary positive constant. 
For each x, let t be the number of y such that Si{x) fl Sk{y) is non-empty. Then with probability 
l-o(l), 

W{x,y) = E,g5,(^)n5fe{j/)^2(s2)---/^fc-i(sfc-i) < apt 
for every x,y such that Si{x) fl Sk{y) is non-empty. 

This is not a concentration assumption. For instance, in the case of systems with two degrees of 
freedom, it follows trivially from the fact that \Si{x) fl Sk{y)\ < 1 and each /ii(si) is at most p""^. In 
more complicated cases, we end up wishing to prove that a certain integer-valued random variable 
with mean n"*^ has a probability of exceeding a large constant C. 

We are now ready to state our main conditional results. Note that in all of these it is necessary to 
assume that the probability q with which we choose our random set U is smaller than some positive 
constant S. For colouring theorems this is not a problem, because these properties are always monotone. 
It is therefore enough to know that the property holds almost surely for a particular probability q to 
know that it holds almost surely for all probabilities larger than q. 

For density theorems, we can also overcome this difficulty by partitioning any random set with 
large probability into a number of smaller random sets each chosen with probability less than S. With 
high probability, each of these smaller random sets will satisfy the required density theorem. If we 
take a subset of the original set above a certain density, then this subset must have comparable density 
within at least one of the sets of the partition. Applying the required density theorem within this set, 
we can find the required substructure, be it an arithmetic progression of length /c or a complete graph 
of size t. 

Alternatively, if we know a (robust) sparse density theorem for a small value of p, we can deduce 
it for a larger value q as follows. We can pick a random set V = Xp by first choosing U = Xq and 
then choosing V = Up/q. Since the result is true for almost every V = Xp, it will be the case that for 
almost every U = Xg, almost every V = Up/q will satisfy the result. It follows by a simple averaging 
argument that for almost every U = Xq the robust version of the density theorem holds again. 

Unfortunately, for structural results, no simple argument of this variety seems to work and we will 
have to deal with each case as it comes. 
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Theorem 9.1. Suppose that S, X and pQ satisfy Conditions 1 and 2. Suppose also that there exist 
positive constants p and /3 such that for every subset B C X of density at least p there are at least 
P\S\ sequences s = {si, . . . ,Sk) G S such that Si G B for every i. Then, for any e > 0, there exist 
positive constants C and 5 with the following property. Let U he a random subset of X, with elements 
chosen independently with probability Cpo < q < 5. Then, with probability tending to 1 as \X\ tends 
to infinity, every subset A of U of size at least p + e contains at least (/3 — e)^^!^! sequences such that 
Si € A for every i. 

Proof. Basically the result is true because Theorem 4.5 proves the conclusion conditional on the three 
key properties set out before the statement of Theorem 4.5, and our probabilistic arguments in the last 
few sections show that the three properties follow from Conditions 1 and 2. Indeed, we would already 
be done if it were not for one small extra detail: we need to deal with the fact that Theorem 4.5 has 
a conclusion that concerns m random sets J7i, . . . , Um^ whereas we want a conclusion that concerns a 
single random set U . 

Let ry. A, d and m be as required by Theorem 4.5. Condition 1 implies that Property 2 holds with 
probability 1 — o(l). Lemma 8.4 tells us that Property 1 holds with probability 1 — o(l) provided that 
Property 2 holds and the function W satisfies the bound that is given to us by Condition 2, for some 
a that depends on 77, A, d and m. Property plainly holds with high probability. Finally, Lemma 6.15 
tells us that if Properties 0, 1 and 2 hold with probability 1 — o(l), then so does Property 3. Thus, 
with probability 1 — o(l) we have all four properties. 

It follows from Theorem 4.5 that if C/i, . . . , Um are independent random subsets of X, each chosen 
binomially with probability p > po, and pi, . . . , Pm are their associated measures, then, with proba- 
bility 1 — 0(1), Ks^sh{si) . . . h(sk) > /5 — e whenever < h < m~^{pi + • • • + pm) and \\h\\i > p + Se/A. 

Let U = UiU ■ ■ ■ L) Um- Then [/ is a random set with each clement chosen independently with 
probability q = 1 — {1 — p)™" > mp{l — e/8), provided 6 (and hence p) is sufficiently small. Let p be 
the associated measure of ?7, let < / < /tx and suppose that ||/||i > p + e. Then replacing / by 
the smaller function h = m.m{f,m~^{pi + • • • + Pm)} we have \\h\\i > p + 3e/4, which implies that 
Ks^sh{si) . . . h{sk) > P — e, which in turn implies that Kg^sfisi) ■ ■ ■ f{sk) > — e- 1^ 

Conditions 1 and 2 also imply an abstract colouring result and an abstract structural result in a 
very similar way. 

Theorem 9.2. Suppose that S, X and po satisfy Conditions 1 and 2. Suppose also that r is a positive 
integer and /3 a positive constant such that for every colouring of X with r colours there are at least 
I3\S\ sequences s = (si, . . . , s^) € S such that each Si has the same colour. Then there exist positive 
constants C and 6 with the following property. Let U be a random subset of X, with elements chosen 
independently with probability Cpo < q < 6. Then, with probability 1 — o{l), for every colouring of U 
with r colours there are at least 2~^'''^^^ l3p''\S\ sequences s = (si, . . . , Sfe) G S such that each Si has the 
same colour and each Si is an element of U . 

The only further ingredient needed to prove this theorem is Theorem 4.8. Other than this, the 
proof is much the same as that of Theorem 9.1. 

Theorem 9.3. Suppose that S, X andpo satisfy Conditions 1 and 2 and letV be a collection of2°^^-^\^ 
subsets of X. Then, for any e > 0, there exist positive constants C and 5 with the following property. 
Let U be a random subset of X, with elements chosen independently with probability Cpo < q < S. 
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Then, with probability 1 — o(l), for every function / : X — )• M with < f < /i there exists a function 
g : X ^M. with < g < 1 such that 

^sesf{si) ■ ■ ■ f{sk) > ^sesgisi) ■ ■ ■ gisk) - e 

and, for all V eV, 

\Y,f{x)-Y,g{x)\<e\X\. 

x&v xev 

The main extra point to note here is that Conditions 1 and 2 imply not just assumption 3 but also 
assumption 3'. This allows us to apply Theorem 4.10. 

9.2 The critical exponent 

The aim of this paper has been to prove results that are best possible to within a constant. A 
preliminary task is to work out the probability below which we cannot hope to prove a result. (For 
density problems, it is easy to prove that below this probability the result is not even true. For natural 
colouring problems, it usually seems to be the case that the result is not true, but the known proofs 
are far from trivial.) To within a constant, the probability in question is the probability p such that 
the following holds: for each j < k and each x € X the expected number of elements s E S such that 
Sj = X (that is, such that s € Sj{x)), and Sj belongs to Xp for each i ^ j \s equal to 1. 

In concrete situations, X will be one of a family of sets of increasing size, and S will be one of a 
corresponding family of sets of sequences. Then it is usually the case that the probability p calculated 
above is within a constant of for some rational number a that does not depend on which member 

of the family one is talking about. In this situation, we shall call a the critical exponent for the family 
of problems. Our results will then be valid for all p that exceed C|X|^" for some constant C . We shall 
denote the critical exponent by as, even though strictly speaking it depends not on an individual S 
but on the entire family of sets of sequences. 

To give an example, if S consists of all nondegenerate edge-labelled copies of in Kn, then the 
expected number of copies with a particular edge in a particular place, given that that edge belongs to 
U , is 2(n — 2)(n — 3)p^ (since each >S'j(e) has size 2{n — 2)(n — 3) and there are five edges that must be 
chosen). Setting that equal to 1 tells us that p is within a constant of n~^/^, so the critical exponent 
is 2/5. (This is a special case of the formula l/mk{K) = {vk — k)/{eK — !)•) 

This calculation is exactly what we do in general: if each element of S" is a sequence of length k 
and we are given that x G Xp, then the expected number of elements of Sj{x) that have all their terms 
in Xp is p^-^\Sj{x)\. This equals 1 when p = \Sj{x)\-'^/^^-^\ If \Sj{x)\ = C\X\^ for some that is 
independent of the size of the problem, then the critical exponent is therefore 6/{k — 1). 

If we can prove a robust density theorem for S, and can show that Conditions 1 and 2 hold when 
p = C\X\~"^ for some constant C, then we have proved a result that is best possible to within a 
constant. For colouring theorems, we cannot be quite so sure that the result is best possible, but in 
almost all examples where the 0-statement has been proved, it does indeed give a bound of the form 
c|X|-°s. 

9.3 Unconditional results 

In this section we concentrate on the two kinds of sequence system for which we have proved that 
Conditions 1 and 2 hold when p = C|X|-"s. 
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Recall that S has two degrees of freedom if Si{x) D Sj{y) is either empty or a singleton whenever 
i 7^ j- A good example of such a system is the set of arithmetic progressions of length k in Zp for some 
prime p. We say that S is a set of copies of a hypergraph if K is a fc-uniform hypergraph with edges 
ai, . . . , Oe, X is the complete A;- uniform hypergraph and S is the set of all sequences of the form 
{(j){ai), . . . , (f){ae)), where (p is an injection from the vertices of K to {1, 2, . . . , n}. 

As above, we assume that S has the additional homogeneity property that Si{x) Sk{y) always 
has the same size when it is non-empty. (In the hypergraph case, e plays the role of k and k has a 
different meaning: thus, the property in that case is that Si{x) fl Se{y) always has the same size when 
it is non-empty.) And in the hypergraph case, we make the further assumption that the hypergraph K 
is strictly balanced, which means that for every proper subhypergraph J G K we have the inequality 
f^kiJ) < iT^k{J^)j where mk{K) is shorthand for (ck — ^)/{vk — k). 

Given a system S with two degrees of freedom, let t be the size of each Sj{x), and suppose that 
t = \Xp. Then the critical exponent of S is 7/(A; - 1). (Note that = ^-i/(fe-i)_-) when 

is a set of copies of a strictly balanced hypergraph K, the critical exponent is l/mk{K). It is 
straightforward to show that sparse density results cannot hold for random subsets of X chosen with 
probability c|X|~"^ if c is a sufficiently small positive constant. Broadly speaking, we shall show 
that they do hold for random subsets chosen with probability C\X\~"^ when C is a sufficiently large 
positive constant. 

Let us call a system S good if the above properties hold. That is, roughly speaking, a good system 
is a system with certain homogeneity properties that cither has two degrees of freedom or comes from 
copies of a graph or hypergraph. We shall also assume that \X\ is sufficiently large. When we say 
"there exists a constant C," this should be understood to depend only on k in the case of systems of 
two degrees of freedom, and only on K in the case of copies of a strictly balanced hypergraph, together 
with parameters such as density or the number of colours in a colouring that have been previously 
mentioned in the statement. 

Theorem 9.4. Let X be a finite set and let S be a good system of ordered subsets of X . Suppose that 
there exist positive constants p and (3 such that for every subset B C X of density at least p there are 
at least sequences s = (si, . . . , 8^) € S such that Si E B for every i. Then, for any e > 0, there 
exist positive constants C and S with the following property. Let U be a random subset of X, with 
elements chosen independently with probability C\X\~°'^ < p < S. Then, with probability tending to 1 
as \X\ tends to infinity, every subset A of U of size at least (p -|- e)\U\ contains at least (/3 — e)p'^|iS'| 
sequences such that Si € A for every i. 

Proof. By Theorem 9.1, all we have to do is check Conditions 1 and 2. Condition 1 is given to us 
by Lemma 7.2 when S has two degrees of freedom, and by Lemma 7.4 when 5 is a system of copies 
of a graph or hypergraph, even when C = 1. (In the case where S has two degrees of freedom, 
see the remarks following Lemma 7.2 for an explanation of why the result implies Condition 1 when 

p= 

When S has two degrees of freedom. Condition 2 holds as long as p^^^^"^) < apt, as we have already 
remarked. This tells us that p needs to be at least {t/ay/^''~^\ In this case, t = |5'j(a;)| for each x and 
j, so (t/a)^/^'^~^^ is within a constant of as required. When S comes from copies of a strictly 

balanced graph or hypergraph, Lemmas 8.5 and 8.7 give us Condition 2, again with p = C\X\~°'^ . □ 

Exactly the same proof (except that we use Theorem 9.2 instead of Theorem 9.1) gives us the 
following general sparse colouring theorem. 
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Theorem 9.5. Let X he a finite set and let S be a good system of ordered subsets of X. Suppose that r 
is a positive integer and that P is a positive constant such that for every colouring of X with r colours 
there are at least I3\S\ sequences s = {si, . . . , Sk) G S such that each Si has the same colour. Then 
there exist positive constants C and S with the following property. Let U be a random subset of X, 

with elem,ents chosen independently with probability C\X\~'^^ < p < 5. Then, with probability 1 — o(l), 
for every colouring of U with r colours there are at least 2^^^^'^'' l3p^\S\ sequences s = (si, . . . , sj-) G S 
such that each Sj has the same colour and each Si is an element of U . 

Finally, wc have the following general sparse structural theorem. 

Theorem 9.6. Let X be a finite set and let S be a good system of ordered subsets of X. Then, for any 
e > 0, there exist positive constants C and S with the following property. Let U be a random subset 
of X, with elements chosen independently with probability C\X\~"'^ < p < S, let jj. be the associated 
measure of U and let V be a collection of 2°^^^-^^'^ subsets of X. Then, with probability 1 — o{l), for 
every function f with < f < fi there exists a function g with < g < 1 such that 

^sesf{si) . . . f{sk) > Esesgisi) ■ ■ ■ g{sk) - e 

and, for all V eV, 

x&v xev 

In applications, we often want g to take values in {0, 1} rather than [0, 1]. This can be achieved 
by a simple and standard modification of the above result. 

Corollary 9.7. Let X be a finite set and let S be a good system of ordered subsets of X, each of size 
k. Then, for any e > 0, there exist positive constants C and 5 with the following property. Let U be a 
random subset of X , with elements chosen independently with probability C|X|^"''' < p < 5, let ^ he 
the associated measure of U and let V he a collection of 2°^^^-^^^ subsets of X. Then, with probability 
1 — o(l), for every function f with < f < n there exists a function h taking values in {0, 1} such 
that 

Kesfisi) ■ ■ ■ f{sk) > EsesKsi) . . . h{sk) - e 

and, for all V eV, 

\^fix)-^h{x)\<e\X\. 

x&V xGV 

Proof. The basic idea of the argument is to choose a function g that satisfies the conclusion of Theorem 
9.6 with e replaced by e/2, and to let h{x) = 1 with probability g{x) and with probability l — g{x), all 
choices being made independently. Then concentration of measure tells us that with high probability 
the estimates are not affected very much. 

Note first that the expectation of Kg^sh^si) . . . h{sk) is Kg^sgisi) . . . g{sk). By how much can 
changing the value of h{x) change the value of Esg5/t(si) . . . /i(sjfc)? Well, if x is one of si, . . . , Sk then 
h{si) . . . h{sk) can change by at most 1 and otherwise it does not change. The probability that x is one 
of si, . . . , Sfc is /c/|X|, by the homogeneity of S (which tells us that each sj is uniformly distributed). By 
Azuma's inequality it follows that the probability that |Esg5g'(si) . . . g{sk) — Esi=sh{si) . . . h{sk)\ > e/2 
is at most 2exp(— e^|X|/8A:^). This gives us the first conclusion with very high probability. 

The second is obtained in a similar way. For each V E V the probability that | YlxeV ^(^) ~ 
^xev 9(^)\ — ^l-^lf^ tiy Azuma's inequality, at most 2 cxp(— e^|X[/8). Since there are 2°^^-^^^ sets 
in V, a union bound gives the second conclusion with very high probability as well. □ 
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10 Applications 



10.1 Density results 

As a first example, let us prove Theorem 1.12, the sparse analogue of Szemeredi's theorem. We shall 
consider Szemeredi's theorem as a statement about arithmetic progressions mod p in dense subsets of 

Zp for a prime p. We do this because the set of mod-p arithmetic progressions of length k in is a 
homogeneous system with two degrees of freedom. However, once we have the result for this version 
of Szemeredi's theorem, we can easily deduce it for the more conventional version concerning a sparse 
random subset of [n\. We simply choose a prime p between 2n and 4n, pick a sparse random subset 
U of Zp, and then apply the result to subsets of U that happen to be subsets of {1, 2, . . . ,n}, since 
arithmetic progressions in these subsets will not wrap around. Similar arguments allow us to replace 
[n] by Zp for our later applications, so we mention once and for all now that for each application 
it is easy to deduce from the result we state a result for sparse subsets of intervals (or grids in the 
multidimensional case) . 

Since we wish to use the letter p to denote a probability, we shall now let n be a large prime. 

By Theorem 9.1, all wc have to do is check the robust version of Szemeredi's theorem, which can 
be proved by a simple averaging argument, originally observed by Varnavides [62] (who stated it for 
progressions of length 3). 

Theorem 10.1. Let k be an integer and 6 > a real number. Then there exists an integer hq and 
c > such that if n is a prime greater than or equal to no and B is a subset ofZn with \B\ > Sn then 
B contains at least cr? arithmetic progressions of length k. 

Proof. Let m be such that every subset of {1, 2, ... , m} of density S/2 contains an arithmetic progres- 
sion of length k. Now let 5 be a subset of Z„ of density S. For each a and d with d ^ let Pa,d be 
the mod-n arithmetic progression {a, a + d, . . . , a + (m — l)d} be random elements of Z„ with d 7^ 0. 

Also, if we choose Pa^d a-t random, then the expected density of B inside Pa^d is ^, so with probability 
at least 5/2 it is at least 5/2. It follows with probability at least 5/2 that Pa^ contains an arithmetic 
progression that is contained in B. Since Pa^d contains at most m? arithmetic progressions of length fc, 
it follows that with probability at least (5/2 at least m~'^ of the progressions of length k inside Pa^d are 
contained in B. But every mod-n arithmetic progression of length k is contained in the same number 
of progressions Pa,d- Therefore, the number of progressions in B is at least (5m~^/2)n^. □ 

Very similar averaging arguments are used to prove the other robust density results we shall need 
in this subsection, so we shall be sketchy about the proofs and sometimes omit them altogether. 

The next result is the sparse version of Szemeredi's theorem. Recall that we write Xp for a random 
subset of X with each element chosen independently with probability p, and that we say that a set 
U is {k,5)-Szemeredi if every subset B dU oi size at least 5\U\ contains an arithmetic progression of 
length k. 

Theorem 10.2. Given 5 > Q and a natural number k > there exists a constant C such that if 
p > Cn~^^^^~^\ then the probability that {1^n)p is {k, 5)-Szemeredi is 1 — o(l). 

Proof. In the case where p is not too large, this follows immediately from Theorems 9.4 and 10.1. 
The result for larger probabilities can be deduced by using the argument given before Theorem 9.1. 
Alternatively, note that a subset of relative density 5 within a subset of [n]p has density 5p in [n] . So 
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if p is larger than a fixed constant A (as it will be in the case not already covered by Theorem 9.4), 
we can just apply Szemeredi's theorem itself. □ 



A simple corollary of Theorem 10.2 is a sparse analogue of Van dcr Wacrdcn's theorem [61] on 
arithmetic progressions in r-colourings of [n] . Note that this theorem was proved much earlier by Rodl 
and Rucinski [46] and is known to be tight. 

Let us now prove sparse versions of two generalizations of Szemeredi's theorem. The first general- 
ization is the multidimensional Szemeredi theorem, due to Furstenberg and Katznelson [13]. We shall 
state it in its robust form, which is in fact the statement that Furstenberg and Katznelson directly 
prove. (It also follows from the non-robust version by means of an averaging argument.) 

Theorem 10.3. Let r be a positive integer and 5 > be a real number. If P d 17" is a fixed set, 
then there is a positive integer ng and a constant c > such that, for n > hq, every subset B of the 
grid [n]^ with \B\ > Srf contains crf'^^ subsets of the form a + dP, where a G [nY and d is a positive 
integer. 

Just as with Szemeredi's theorem, this statement is equivalent to the same statement for subsets 
of Z^. So let P be a subset of 7/ and let (xi, . . . ,Xk) be an ordering of the elements of P. Let S be 
the set of sequences of the form Sa^ = (a + xid, . . . ,a + x^d) with d ^ Then S is homogeneous 
and has two degrees of freedom. Moreover, if n is large enough, then no two elements of Sa,d are the 
same. From the conclusion of Theorem 10.3 it follows that there are at least cIS"! sequences in S with 
all their terms in B. We have therefore checked all the conditions for Theorem 9.4, so we have the 
following sparse version of the multidimensional Szemeredi theorem. (As before, the result for larger 
p follows easily from the result for smaller p.) Wc define a subset U of to be {P^ 5)- Szemeredi if 
every subset B of U of size at least 5\U\ contains a homothetic copy a + dP of P. 

Theorem 10.4. Given integers r and k, a real number S > and a subset P d II" of size k, there 
exists a constant C such that if p > Cn~^/^^~^'> , then the probability that (Z^)p is {P,6)-Szemeredi is 
l-o(l). 

The second generalization of Szemeredi's theorem we wish to look at is the polynomial Szemeredi 
theorem of Bergelson and Leibman [1] . Their result is the following. 

Theorem 10.5. Let 6 > be a real number, let k be a positive integer and let Pi, . . . , Pk be polynomials 

with integer coefficients that vanish at zero. Then there exists an integer hq and a constant c > 
such that if n > no and B is a subset of [n] such that \B\ > 6n then B has a subset of the form 
{a,a + Piid),...,a + Pkid)}. 

We will focus on the specific case where the polynomials are x^, 2x'', ...,(/? — l)x^ (so k has been 
replaced by /e — 1). In this case, the theorem tells us that we can find an arithmetic progression of 
length k with common difference that is a perfect rth power. We restrict to this case, because it is 
much easier to state and prove an appropriate robust version for this case than it is for the general 
case. 

Note that if a,a + d^ , ■ ■ ■ ,a+{k — l)d^ € [n], then d < (n/k)^^^ . This observation and another easy 
averaging argument enable us to replace Theorem 10.5 by the following equivalent robust statement 
about subsets of Z„ (see, for example, [23]). 
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Corollary 10.6. Let k, r be integers and 5 > a real number. Then there exists an integer uq and 
a constant c > such that if n > uq and B is a subset o/ Z„ such that \B\ > 6n then B contains at 
least cn)-'^^!'^ pairs (a, d) such that a, a + , ■ ■ ■ ,a + {k — G B and d < (n/k)^^'^ . 

Let us say that a subset U of Z^, is {k, r, 6)-Szemeredi if in every subset B of U of size at least 6\U\ 
there exists a progression of length k of the form a,a + , . . . ,a + (k — l)d^ with d < (n//c)^/^. 

Theorem 10.7. Let k,r be integers and 5 > a real number. Then there exists a constant C such 
that if p > Cn~^^^^~^^^ , then the probability that {Zn)p is {k,r,6)-Szemeredi is 1 — o(l). 

Proof. Let X = Z„ and S be the collection of progressions of the form a,a + d"^ , . . . ,a + {k — l)d'^ 
with d < {n/kY^"' . Because of this restriction on d, S has two degrees of freedom. It is also obviously 
homogeneous. The size of each Sj{x) is n^/'' to within a constant, so the critical probability is 7/ {k — 1) 
with 7 = 1/r. Therefore, provided p is at most some constant A, the result follows from Theorems 9.4 
and 10.5. For p larger than A, the result follows from the polynomial Szemeredi theorem itself. □ 

Note that the particular case of this theorem when k = 2 was already proved by Nguyen [40]. To 
see that this result is sharp, note that the number of progressions of length k in the random set is 
roughly p^n^~^^/^ . This is smaller than the number of vertices pn when p = n~^^^^~^^^ . 

We will now move on to proving sparse versions of Turan's theorem for strictly A;-balanced k- 
uniform hypergraphs. As we mentioned in the introduction, some of the dense results are not known, 
but this does not matter to us, since our aim is simply to show that whatever results can be proved 
in the dense case carry over to the sparse random case when the probability exceeds the critical 
probability. 

For a A;-uniform hypergraph K, let ex{n, K) denote the largest number of edges a subgraph of Kn^ 
can have without containing a copy of K. As usual, we need a robust result that says that once a 
graph has more edges than the extremal number for K, by a constant proportion of the total number 
of edges in K, then it must contain many copies of K. The earliest version of such a supersaturation 
result was proved by Erdos and Simonovits [5] . The proof is another easy averaging argument along 
the lines of the proof of Theorem 10.1. 

Theorem 10.8. Let K be a k-uniform hypergraph. Then, for any e > 0, there exists S > such that 
if L is a k-uniform hypergraph on n vertices and 

e{L) > ex{n,K) + en'', 

then L contains at least Sn'"'^ copies of K. 

We will say that a A;-uniform hypergraph H is {K, e)-Turan if any subset of the edges of H of size 

/ ex(n,K) \ 

contains a copy of K. Recall that mk{K) = [ex — '^)/{vk — k). We shall write Gn}p for a random 
/c-uniform hypergraph on n vertices, where each edge is chosen with probability p. 
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Theorem 10.9. For every e > and every strictly k-balanced k-uniform hypergraph K , there exists 
a constant C such that if p > Cn~^^"^''^^\ then the probability that G^)p is {K, e)-Turdn is 1 — o(l). 

Proof. For p smaller than a fixed eonstant A, the result follows immediately from Theorems 9.4 and 
10.8. For p > 6o, we may apply the argument discussed before Theorem 9.1. That is, we may partition 
gI^j) into a small set of random graphs, each of which has density less than A and each of which is 
{K, e)-Turan. If now we have a subgraph of Gn^p of density at least + e, then this subgraph 

must have at least this density in one of the graphs from the partition. Applying the fact that this 
subgraph is {K, e)-Turan implies the result. □ 

In particular, this implies Theorem 1.10, which is the particular case of this theorem where K is 
a strictly balanced graph. Then ex{n,K) is known to be ^1 — ^(^x)-i ~^ ^(^)) (2)' where xi^) is the 
chromatic number of K. 



10.2 Colouring results 

We shall now move on to colouring results that do not follow from their corresponding density versions. 
Let us begin with Ramsey's theorem. As ever, the main thing we need to check is that a suitable 
robust version of the theorem holds. And indeed it does: it is a very simple consequence of Ramsey's 
theorem that was noted by Erdos [4] . 

Theorem 10.10. Let H be a hypergraph and let r be a positive integer. Then there exists an integer 

(k) 

no and a constant c > such that, if n > no, any colouring of the edges of Kn with r colours is 
guaranteed to contain cri"" monochromatic copies of H. 

Once again the proof is the obvious averaging argument: choose m such that if the edges of if^^ 
are coloured with r colours, there must be a monochromatic copy of H, and then a double count shows 
that for every r-colouring of the edges of there arc at least (J^) / monochromatic copies of H. 

Recall that, given a /c-uniform hypergraph K and a natural number r, a hypergraph is {K, r)- 
Ramsey if every r-colouring of its edges contains a monochromatic copy of K. We are now ready to 
prove Theorem 1.9, which for convenience we restate here. 

Theorem 10.11. Given a natural number r and a strictly k-balanced k-uniform hypergraph K , there 
exists a positive constant G such that if p > Gn~^/'^*'^^\ then the probability that Gn}p is {K,r)- 
Ramsey is 1 — o(l). 

Proof. For a sufficiently large constant G the result for p = Gn~^^'^''^^^ follows from Theorems 9.5 
and 10.10. li p < q and we know the result for G^^j, , then the result for Gn]i follows easily. Given 

(k) (k) 

an r-colouring of Gn,q, randomly select the edges of Gn,q, each with probability p/q. The resulting 
hypergraph is distributed as G^^p, so with probability 1 — o(l) it is (if, r)-Ramsey, and therefore 
contains a monochromatic copy of K. □ 

With only slightly more effort we can obtain a robust conclusion. Theorem 9.5 tells us that with 

(k) 

high probability the number of monochromatic copies of K in any r-colouring of Gn,p is cp^^n'"^ for 
some constant c > 0, and then an averaging argument implies that with high probability the number 

(k) 

of monochromatic copies in an r-colouring of Gn/q is cq'^^n^^ . 
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We shall now take a look at Schur's theorem [53], which states that if the set {l,...,n} is r- 
coloured, then there exist monochromatic subsets of the form {x,y,x + y}. As with our results 
concerning Szemeredi's theorem, it is more convenient to work in Z„. To see that this implies the 
equivalent theorem in [n], let [n]p be a random subset of [n] made from the union of two smaller 
random subsets, each chosen with probability q such that p = 2q — q^. Call these sets Ui and U2- 
Then the subset of formed by placing the set Ui in the position {1, . . . , n} and the set —U2 in the 
set {—1, • • • , —n) (the overlap n = —n is irrelevant to the argument, since it is unlikely to be in the 
set) will produce a random subset of where each element is chosen with probability q. If a sparse 
version of Schur's theorem holds in Z2n, then with high probability, any 2r-colouring of this random 
set yields cq'^v? monochromatic sets {x, y,x + y} for some constant c > 0. 

Consider now an r-colouring of the original set C/i U U2 in r colours Ci, . . . , C^. This induces a 
colouring of J7i U — C/2 C Z„ with 2r colours Ci, . . . , C2r'- if x G J7i and is coloured with colour Cj in 
[n], then we continue to colour it with colour Cj, whereas ii x & —U2 and —x has colour Ci in [n], 
then we colour it with colour Cj+r- We have already noted that this colouring must contain many 
monochromatic sets {x, y, x + y}, and each one corresponds to a monochromatic set (either {x, y, x+y} 
or {—X, —y, — {x + y)}) in the original colouring. 

The robust version of Schur's theorem can be deduced from one of the standard proofs, which itself 
relies on Ramsey's theorem for triangles and many colours. 

Theorem 10.12. Let r be a positive integer. Then there exists a constant c such that, for n sufficiently 
large, every r-colouring 0/ {1, ... , n} contains at least cn^ monochromatic triples of the form {x, y,x + 

y}- 

We shall say that a subset A of the integers is r-Schur if for every r-colouring of the points of A 
there is a monochromatic triple of the form {x, y,x + y}. The r = 2 case of the following theorem was 
already known: it is a result of Graham, Rodl and Rucihski [21]. 

Theorem 10.13. For every positive integer r there exists a constant C such that if p > Cn'^^^, then 
the probability that {^n)p is r-Schur is \ — o(l). 

Proof. Let X = Z„\{0} and S be the collection of subsets of X of the form {x, y,x-\-y} with all of x, y 
and x+y distinct. Since any two of x, y and x + y determine the third, it follows that \Si{a)nSj(b)\ < 1 
whenever i,j G {1,2,3}, i 7^ j, and a,b ^ X. Therefore, S has two degrees of freedom. Furthermore, 
each Si(a) has size n — 2. By Theorem 10.12 there exists a constant c such that, in any r-colouring of 
Zji, there are at least cn^ monochromatic subsets of the form {x, y,x + y}. Applying Theorem 9.5 we 
see that there exist positive constants C and cf such that, with probability 1 — o(l) a random subset U 
of Z„ chosen with probability p = Cn~^^'^ satisfies the condition that, in any r-colouring of U , there 
are at least c'p^n^ monochromatic subsets of the form {x, y,x -\-y}. In particular, U is r-Schur. Once 
again, the result for larger probabilities follows easily. □ 

As we mentioned in the introduction, it is quite a bit harder to prove 0-statements for colouring 
statements than it is for density statements. However, 0-statements for partition regular systems have 
been considered in depth by Rodl and Rucihski [47], and their result implies that Theorem 10.13 is 
sharp. 

A far-reaching generalization of Schur's theorem was proved by Rado [41]. It is likely that our 
methods could be used to prove other cases of Rado's theorem, but we have not tried to do so here. 
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since we would have to impose a condition on the configurations analogous to the strictly balanced 
condition for graphs and hypergraphs. 

10.3 The hypergraph removal lemma 

Rather than jumping straight into studying hypergraphs, we shall begin by stating a slight strengthen- 
ing of the triangle removal lemma for graphs. This strengthening follows from its proof via Szemeredi's 
regularity lemma and gives us something like the "robust" version we need in order to use our methods 
to obtain a sparse result. If G is a graph and X and Y are sets of vertices, we shall write G{X, Y) for 
the set of edges that join a vertex in X to a vertex in Y , e{X, Y) for the cardinality of G{X, Y) and 
d{X,Y) for eiX,Y)/\X\\Y\. 

Theorem 10.14. For every o > there exists a constant K with the following property. For every 
graph G with n vertices we can find a partition of the vertices of G into k < K sets Vi, . . . ,Vk, each 
of size either \n/k\ or \n/k'\, and a set E of edges of G with the following properties. 

1. The number of edges in E is at most ar? . 

2. E is a union of sets of the form G(Vi, Vj). 

3. E includes all edges that join a vertex in Vi to another vertex in the same Vi . 

4. Let G' be G with the edges in E removed. For any h,i,j, if there are edges in all of G'{Vh,Vi), 
G'{Vi, Vj) and G'{Vh, Vj), then the number of triangles xyz with x G Vh, y &Vi and z EVj is at 
least a^\Vh\\Vi\\Vj\/128. 

In particular, this tells us that after we remove just a few edges we obtain a graph that contains 
either no triangles or many triangles. Let us briefly recall the usual statement of the dense triangle 
removal lemma and see how it follows from Theorem 10.14. 

Corollary 10.15. For every a > there exists a constant c > with the following property. For every 
graph G with n vertices and at most cr? triangles it is possible to remove at most an^ edges from G 
in such a way that the resulting graph contains no triangles. 

Proof. Apply Theorem 10.14 to a and let c = a^/200i^^. Now let G be a graph with n vertices. Let 
Vi, . . . , Vfc and E be as given by Theorem 10.14 and remove from G all edges in E. If we do this, then 
by condition 1 we remove at most an^ edges from G. If there were any triangle left in G, then by 
condition 4 there would have to be at least [n/A;J^/128 > cn^ triangles left in G, a contradiction. 
This implies the result. □ 

Here now is a sketch of how to deduce a sparse triangle-removal lemma from Theorem 10.14. 
We begin by proving a sparse version of Theorem 10.14 itself. Given a random graph U with edge 
probability p > Cn~^/'^, for sufficiently large G, let H he & subgraph of U. Now use Theorem 9.6 to 
find a dense graph G such that the triangle density of G is roughly the same as the relative triangle 
density of if in [/ (that is, if H has ap^n^ triangles, then G has roughly an^ triangles) and such that 
for every pair of reasonably large sets X,Y of vertices the density dG{X,Y) is roughly the same as 
the relative density of H inside U{X^ Y) (that is, the number of edges of G(X, Y) is roughly p~^ times 
the number of edges of H{X,Y)). 
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Now use Theorem 10.14 to find a partition of tlic vertex set of G (which is also the vertex set of 
H) into sets Vi, . . . ,Vk and to identify a set Eq of edges to remove from G. By condition 2, Eq is 
a union of sets of the form Vj). Define Eh to be the union of the corresponding sets H{yi, Vj) 

and remove all edges in Eh from H. If it happens that G{Vi^ Vj) is empty, then adopt the convention 
that we remove all edges from HiVi^Vk). Note that because the relative densities in dense complete 
bipartite graphs arc roughly the same, the number of edges in Eh is at most 2apn^. Let G' be G with 
the edges in Eg removed and let H' be H with the edges in Eh removed. 

Suppose now that H' contains a triangle xyz and suppose that x £ Vh, y & Vi and z e Vj. Then 
none of G'{Vh, Vi), G'{Vi, Vj) and G'{Vh, Vj) is empty, by our convention above, so condition 4 implies 
that G' contains at least a'^|V/j||I^||Vj|/128 triangles with x G V^, y € and z ^ Vj. Since triangle 
densities are roughly the same, it follows that H' contains at least a^p^|V/i||Vi||V^ |/256 triangles with 
X G Vh, y EVi and z eVj. 

Roughly speaking, what this tells us is that Theorem 10.14 transfers to a sparse random version. 
From that it is easy to deduce a sparse random version of Corollary 10.15. However, instead of giving 
full details of this, we shall prove (in a very similar way) a more general theorem, namely a sparse 
random version of the simplex removal lemma for hypergraphs, usually known just as the hypergraph 
removal lemma. 

The dense result is due to Nagle, Rodl, Schacht and Skokan [39, 50], and independently to the 
second author [19]. A gentle introduction to the hypergraph removal lemma that focuses on the case 
of 3-uniform hypergraphs can be found in [18]. The result is as follows. 

Theorem 10.16. For every 5 > and every integer k > 3 there exists a constant e > such that, if 
G is a k-uniform hypergraph containing at most en^^^ copies of K^^-^, it may be made K^^^-free by 
removing at most 6n^ edges. 

(k) 

A simplex is a copy of K^^i- As in the case of graphs, where simplices are triangles, it will be 
necessary to state a rather more precise and robust result. This is slightly more complicated to do 
than it was for graphs. However, it is much less complicated than it might be: it turns out not to be 
necessary to understand the statement of the regularity lemma for hypergraphs. 

Let us make the following definition. Let H he a fc-uniform hypergraph, and let Ji, . . . , J^, be 
(k — l)-uniform hypergraphs with the same vertex set as H. We shall define H{Ji, . . . , Jfc) to be the 
set of all edges A = {ai, . . . , a^} € H such that {oi, . . . , aj_i, a^+i, . . . , a^} € Ji for every i. (Note 
that if A; = 2 then the sets Ji and J2 are sets of vertices, so we are obtaining the sets G{X, Y) defined 
earlier.) 

Now suppose that we have a simplex in H with vertex set (xi, . . . , Xfc+i). For every subset 
{n, i;} of [k + 1] of size 2, let us write Juv for the (unique) set Jj that contains the [k — l)-set 
{xj : j ^ {u,v}}. Then for each u the set H{Jui, . . . , Ju,u-i, Ju,u+i, ■ ■ ■ , Ju,k+i) is non-empty. We 
make this remark in order to make the statement of the next theorem slightly less mysterious. It 
is an analogue for A;-uniform hypergraphs of Theorem 10.14. For convenience, we shall abbreviate 
H{Jui, ... , Ju,u-i, Ju,u+i, • • • , Ju,k+i) by H'{Juv : V e [k + l],v u). (It might seem unnecessary to 
write "w G [/c + 1]" every time. We do so to emphasize the asymmetry: the set depends on u, while v 
is a dummy variable.) 

Theorem 10.17. For every a > there exists a constant K with the following property. For every 
k-uniform hypergraph H with vertex set [n], we can find a partition of Q^^i) into at most K subsets 
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Ji, . . . , Jm, with sizes differing by a factor of at most 2, and we can find a set E of edges of H with 
the following properties. 

1. The number of edges in E is at most an^ . 

2. E is a union of sets of the form H{Ji^ , . . . , Jj^.). 

3. E includes all edges in any set H{Ji^, . . . , Jj^) for which two of the i^ are equal. 

4- Let H' be H with the edges in E removed. Suppose that for each pair of unequal integers u,v 
[k + 1] we can find a set Juv from the partition in such a way that the hypergraphs H'{Juv ■ v G 
[k + l],v ^ u) are all non-empty. Then the number of simplices with vertices {xi, . . . , Xk+i) such 
that the edge (xi, . . . , . . . , Xk+i) belongs to H'{Juv : v € [k + l],v u) for every u is 

at least (l/2)(a/4)'^Ci^n*^+^, where ck is a constant that depends on K (and hence on a). 

Let us now convert this result into a sparse version. 

Theorem 10.18. For every a > there exist constants C, K and 5 with the following property. Let 
U be a random k -uniform hypergraph with vertex set [n], and with each edge chosen independently 
with probability Cn~^f^ < P < <5- Then with probability 1 — o(l) the following result holds. For every 
k-uniform hypergraph F C U we can find a partition of (^'"\) into at most K subsets Ji, . . . , Jm, with 
sizes differing by a factor of at most 2, and we can find a set Ep of edges of F with the following 
properties. 

1. The number of edges in Ep is at most apn^ . 

2. Ep is a union of sets of the form F{Ji^, . . . ,Jii^). 

3. Ep includes all edges in any set F{Jij^, . . . .,Ji^) for which two of the ih are equal. 

4. Let F' be F with the edges in Ep removed. Suppose that for each pair of unequal integers 
u,v E [k-\-l] we can find a set Juv from the partition in such a way that the hypergraphs F'{Juy : 
V G [k -\- l],v ^ u) are all non-empty. Then the number of simplices with vertices {xi, . . . ,Xk+i) 
such that the edge (xi, . . . , x„_i, x„+i, . . . , x^+i) belongs to F'{Juv ■ v G [k + l],v ^ u) for every 
u is at least {1 / ^){a/ cxP^'^^'n^'^^ , where ck is a constant that depends on K. 

Proof. We have essentially seen the argument in the case of graphs. To start with, let us apply 
Corollary 9.7 with S as the set of labelled simplices, / as p^^ times the characteristic function of F, 
V as the collection of all sets of the form Kn\ji, . . . , J^) where each Ji is a collection of sets of size 
k — 1 (that is, the set of ordered sequences of length k in [n\ such that removing the ith vertex gives 
you an element of Jj), and e = (l/4)(a/4)'^Cif (which turns out also to be less than a/2K^). 

U fc — 1 

Note that as = 1/k in this case, and that the cardinality of V is at most 2 , so the corollary 
applies. From that we obtain a hypergraph H (with characteristic function equal to the function h 
provided by the corollary) such that times the number of simplices in F is at most the number 

of simplices in H minus en'^'^^, and such that the number of edges in H{Ji, . . . , J^) differs from p~^ 
times the number of edges in F(Ji, . . . , J^) by at most en*^ for every (Ji, . . . , J^). 

We now apply Theorem 10.17 to H with a replaced by a/2. Let Eh be the set of edges that we 
obtain and let H' be H with the edges in Eh removed. 
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Let Ji, . . . , Jra be the sets that partition and remove ah edges from F that belong to a set 

F{Ji^ , . . . , Jj^) such that the edges of H(Ji^ , . . . , Jj^) belong to Eh (including when H{Ji^ , . . . , Jj^) is 
empty). Let Ep be the set of removed edges and let F' be F after the edges are removed. 

Since m < K, there are at most K'^ /c-tuples {Ji^, . . . , Jij,). For each such fc-tuple the number 
of edges in H{Ji^, . . . , Ji^) differs from times the number of edges in F{Ji-^, . . . , Jj^,) by at most 
en''. Therefore, since Eh and Ep arc unions of sets of the form H{Ji^, . . . , Ji^) and F(Ji^, . . . , Ji,.), 
respectively, and since \Eh\ < it follows that \Ef\ < (a/2 + eK'')pn^ < apn . This gives us 

property 1. Properties 2 and 3 are trivial from the way we constructed Ep. So it remains to prove 
property 4. 

Suppose, then, that (ii, . . . , ik+i) is a (A;+l)-tuple such that there are edges in all of the hypergraphs 
F'{Juv : V € [k + l],v u) (tt = 1, 2, . . . , /c + 1). Then there must be edges in all the hypergraphs 
H'{Juv : V & [k + l],v ^ u) as well, or we would have removed the corresponding sets of edges from F. 
By property 4 of the dense result applied to H, it follows that H' contains at least {l/2){a/8)''cKn'''^^ 
simplices, which implies that H does as well, which implies that F contains at least {{l/2){a/8)'^CK — 
^^pk+i^k+i siniplices, which gives us the bound stated. □ 

Now let us deduce the simplex removal lemma. This is just as straightforward as it was for graphs. 

Corollary 10.19. For every a > there exist constants C and c > with the following property. 
Let U be a random k-uniform hypergraph with vertex set [n], and with each edge chosen independently 
with probability p > Cn~^f^ . Then with probability 1 — o(l) the following result holds. Let F be a 
subhypergraph of U that contains at most cp^+^n''+^ simplices. Then it is possible to remove at most 
apn^ edges from F and make it simplex free. 

Proof. Let c = (l/8)(a/4)^cx, where ck is the constant given by Theorem 10.18, and apply that 
theorem to obtain a set Ep, which we shall take as our set E. Then E contains at most apn^ edges, 
so it remains to prove that when we remove the edges in E from F we obtain a hypergraph F' with 
no simplices. 

Suppose we did have a simplex in F' . Let its vertex set be {xi, . . . ,Xk+i}- For each {u,v} C [^ + 1] 
of size 2, let J^v be the set from the partition given by Theorem 10.18 that contains the {k — l)-set 
{xi : i ^ {ti, f}}. Then, as we commented before the statement of Theorem 10.17 (though then we 
were talking about H), for each u the set F'{Juy : u G [fc + 1], f / tt) is non-empty. Therefore, by 
Theorem 10.18 F', and hence F, contains at least (l/4)(l/4)'^Cft:p'^+^n'^"'"^ simplices. By our choice of 
c, this is a contradiction. 

This argument works for Cn~^/^ < p < S. However, since 5 is a constant, we may, for p > 5, 
simply apply the hypergraph removal lemma itself to remove all simplices. □ 

10.4 The stability theorem 

As a final application we will discuss the stability version of Turan's theorem, Theorem 1.11. The 
original stability theorem, due to Simonovits [54], is the following. 

Theorem 10.20. For every 5 > and every graph H with x{H) > 3, there exists an e > such that 
any H-free graph with at least ^1 — ^(^h)_i — ^ (2) ^dges may be made {x{H) — l)-partite by removing 
at most 5n^ edges. 
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Unfortunately, this is not quite enough for our purposes. We would like to be able to say that a 
graph that does not contain too many copies of H may be made {x{H) — l)-partite by the deletion of 
few edges. To prove this, we appeal to the following generalization of the triangle removal lemma. 

Theorem 10.21. For every 6 > and every graph H, there exists a constant e' > such that, if G 
is a graph containing at most e'ri"" copies of H , then it may he made H-free by removing at most 5n^ 
edges. 

Combining the two previous theorems gives us the robust statement we shall need. 

Theorem 10.22. For every 5 > and every graph H with x{H) > 3, there exists a constant e such 
that any graph with at most eri"^ copies of H and at least ^1 — ^(^h^_i — (2) ^dges may be made 
{x{H) — 1) -partite by removing at most 8v? edges. 

To prove Theorem 1.11, the statement of which we now repeat, we will follow the procedure 
described at the end of Section 3. 

Theorem 10.23. Given a strictly 2-balanced graph H with x{H) > 3 and a constant 5 > Q, there exist 
positive constants C and e such that in the random graph Gn,p chosen with probability p > Cn''^^'^'^^^\ 
where m2{H) = {eH — ^)/{vH—^), the following holds with probability tending to 1 as n tends to infinity. 
Every H-free subgraph ofGn,p with at least ^1 — -^(^h^^i — ^ ^(2) ed^es may be made {x{H) — l) -partite 
by removing at most 5pn? edges. 

Proof. Fix (5 > 0. An application of Theorem 10.22 gives us e > such that any graph with at 
most en"^ copies of H and at least ^1 — — 2e^ (2) edges may be made (x(-ff) — l)-partite by 

removing at most (5n^/2 edges. 

Apply Corollary 9.7 with S the set of all labelled copies of H in Kn and V the set of all vertex 
subsets of {1, ... , n}. This yields constants G and A such that, for Cn~^/"^'^^^^ < P < A, the following 
holds with probability tending to 1 as n tends to infinity. Let G be a random graph where each edge is 
chosen with probability p. Let jj. be its characteristic measure. Then, if / is a function with < / < /x, 
there exists a [0, l]-valued function j such that Ks^sfisi) ■ ■ ■ f{se) > ^sesj{si) ■ ■ ' j{se) — e and, for 
all V eV, \E,;,.^\/f{x) — Ex£vj{x)\ < fljvjj where r/ = min(e, (5/2t). 

Let ^ be a H-fiee subgraph of G with ^1 — — p(^) edges and let < / < be 

times its characteristic function. Apply the transference principle to find the function j, which is the 
characteristic measure of a graph J. The number of copies oi H in J is at most en^" . Otherwise, we 
would have 

Esesfisi) . . . fise) > Esesj{si) ■ ■ ■ j{se) - e > 0, 

implying that A was not iJ-free. Moreover, the number of edges in J is at least ^1 — — 2ej (2) • 

Therefore, by the choice of e, J may be made (x(i?) — l)-partite by removing at most (5n^/2 edges. 

Let Vi, . . . , Vf be the partite pieces, where t = x(i7) — 1. By transference, |E-rgy./(a;)— Ea;gy._7(x)| < 
r/|^ for each 1 < i < t. Therefore, if we remove all of the edges of A from each set in Vi, we have 
removed at most 
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Moreover, the graph that remains is (xiH) — l)-partite, so we are done. 

It only remains to consider the case when p > X. However, as observed in [32], for p constant, the 
theorem follows from an application of the regularity lemma. This completes the proof. □ 

As a final note, we would like to mention that the method used in the proof of Theorem 10.23 should 
work quite generally. To take one more example, let K be the Fano plane. This is the hypergraph 
formed by taking the seven non-zero vectors of dimension three over the field with two elements and 
making xyz an edge if x + y = z. The resulting graph has seven vertices and seven edges. It is known 
[3] that the extremal number of the Fano plane is approximately | (3) . Since the Fano plane is strictly 
3-balanced, Theorem 10.9 implies that if {/ is a random 3-uniform hypergraph chosen with probability 
p > Cn"^/^, then, with high probability, U is such that any subgraph of U with at least (| + e) \U\ 
edges contains the Fano plane. 

Moreover, it was proved independently by Keevash and Sudakov [31] and Fiiredi and Simonovits 
[12] that the extremal example is formed by dividing the ground set into subsets A and B of nearly 
equal size and taking all triples that intersect both as edges. The stability version of this result says 
that, for all 6 > 0, there exists e > such that any 3-uniform hypergraph on n vertices with at least 
(I — e) (3) edges that does not contain the Fano plane may be partitioned into two parts A and B 
such that there are at most Sn^ edges contained entirely within A 01 B. The same proof as that of 
Theorem 10.23 then implies the following theorem. 

Theorem 10.24. Given a constant 5 > 0, there exist positive constants C and e such that in the 
random graph Gnl, chosen with probability p > Cn^'^/^ , the following holds with probability tending to 

I as n tends to infinity. Every subgraph of Gn}p with at least (| — e) e{G) edges that does not contain 
the Fano plane may he made bipartite, in the sense that all edges intersect both parts of the partition, 
by removing at most Spn^ edges. 

II Concluding remarks 

One question that the results of this paper leave open is to decide whether or not the thresholds we 
have proved are sharp. By saying that a threshold is sharp, we mean that the window over which the 
phase transition happens becomes arbitrarily small as the size of the ground set becomes large. For 
example, a graph property V has a sharp threshold at p = p{n) if, for every e > 0, 

hm nGn,p satisfies ^) = { ?' !f ^ ^ 5! 7 

n->oo ' [ 1, if p > (1 + e)p. 

Connectedness is a simple example of a graph property for which a sharp threshold is known. The 
appearance of a triangle, on the other hand, is known not to be sharp. A result of Friedgut [7] 
gives a criterion for judging whether a threshold is sharp or not. Roughly, this criterion says that 
if the property is globally determined, it is sharp, and if it is locally determined, it is not. This 
intuition allows one to conclude fairly quickly that connectedness should have a sharp threshold and 
the appearance of any particular small subgraph should not. 

For the properties that we have discussed in this paper it is much less obvious whether the bounds 
are sharp or not. Many of the properties are not even monotone, which is crucial if one wishes to 
apply Friedgut's criterion. Nevertheless, the properties do not seem to be too pathological, so perhaps 
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there is some small hope that sharp thresholds can be proved. There has even been some success in 
this direction already. Recall that the threshold at which Gn,p becomes 2-colour Ramsey with respect 
to triangles is approximately n~^/^. A difHcult result of Friedgut, Rodl, Ruciriski and Tetali [9] states 
that this threshold is sharp. That is, there exists c = c(n) such that, for every e > 0, 

hm P(G„,p is (i^3,2)-Ramsey) = ^ 1^ -1/2' 

^ ^ [1, iip>[l + e)cn I . 

Unfortunately, the function c(n) is not known to tend towards a constant. It could, at least in principle, 
wander up and down forever between the two endpoints. Nevertheless, we believe that extending this 
result to cover all (or any) of the theorems in this paper is important. 

There are other possible improvements that it might well be possible to make. We proved our 
graph and hypergraph results for strictly balanced graphs and hypergraphs, while those of Friedgut, 
Rodl and Schacht apply to all graphs and hypergraphs. On the other hand, our methods allow us to 
prove structural results such as the stability theorem which do not seem to follow from their approach. 
It seems plausible that some synthesis of the two approaches could allow us to extend these latter 
results to general graphs and hypergraphs in a tidy fashion. 

In our approach, restricting to strictly balanced graphs and hypergraphs was very convenient, 
since it allowed us to cap our convolutions only at the very last stage (that is, when all the functions 
involved had sparse random support). In more general cases, capping would have to take place "all 
the way down". It seems likely that this can be done, but that a direct attempt to generalize our 
methods would be messy. 

A more satisfactory approach would be to find a neater way of proving our probabilistic estimates. 
The process of capping is a bit ugly: a better approach might be to argue that with high probability 
we can say roughly how the modulus of an uncapped convolution is distributed, and use that in an 
inductive hypothesis. (It seems likely that the distribution is approximately Poisson.) 

Thus, it seems that the problem of extending our results to general graphs and hypergraphs and 
the problem of finding a neater proof of the probabilistic estimates go hand in hand. 
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